identity layers + randn queries
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 9.30s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_out_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16'),
finished after 4.22s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 9.32s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 11.34s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 11.49s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 5.98s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 8.71s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (65536, 512, 1, 'torch.float32', 'torch.float32'),
finished after 1.68s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 5.00s,
best config selected: num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_2_reduce_grad_pseudo_query_kernel,
with key as (65536, 512, 'torch.float32', 'torch.float32'),
finished after 1.62s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 27.65s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (65536, 512, 8, 'torch.float32', 'torch.float32'),
finished after 1.64s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 26.03s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 19.84s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 12.79s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None;
production_forward fwd+bwd:  33.795 ms
production_forward bwd-only: 28.844 ms
production_forward peak allocated: fwd=1.112 GiB, fwd+bwd=5.114 GiB
production_forward peak reserved:  fwd=1.131 GiB, fwd+bwd=5.131 GiB
liger_forward fwd+bwd:  46.351 ms
liger_forward bwd-only: 33.967 ms
liger_forward peak allocated: fwd=7.665 GiB, fwd+bwd=7.665 GiB
liger_forward peak reserved:  fwd=7.670 GiB, fwd+bwd=7.982 GiB

/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:321: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/select_algorithm.py:3464: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  current_size = base.storage().size()
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.259000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.265000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.269000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.274000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.281000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.286000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.291000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.296000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.301000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.306000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.311000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Runtime error during autotuning: 
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] CUDA driver error: invalid argument
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] This may mean this GPU is too small for max_autotune mode.
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] 
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] . 
E0428 19:04:39.316000 546 torch/_inductor/select_algorithm.py:3727] [0/1] Ignoring this choice.
Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "bmm", "best_time": 0.8618559837341309, "best_triton_pos": 1, "best_triton_time": Infinity, "best_triton_kernel": "triton_bmm_0", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2"}
AUTOTUNE bmm(65536x2x1, 65536x1x512)
strides: [1, 65536, 0], [512, 0, 1]
dtypes: torch.float32, torch.float32
  bmm 0.8619 ms 100.0% 
  triton_bmm_0 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_bmm_1 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_bmm_2 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_bmm_3 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_bmm_4 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_bmm_5 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_6 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_7 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_bmm_8 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.2395 seconds and 0.0003 seconds precompiling for 13 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.08956799656152725, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x131072)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_17 0.0896 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_19 0.0897 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_24 0.0897 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_21 0.0900 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_22 0.0920 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_20 0.0924 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_23 0.0924 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_16 0.0934 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_18 0.0946 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_12 0.0952 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
SingleProcess AUTOTUNE benchmarking takes 0.3561 seconds and 0.2705 seconds precompiling for 18 choices
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.04633599892258644, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x65536)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_35 0.0463 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_36 0.0471 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_38 0.0471 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_34 0.0475 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_41 0.0476 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_37 0.0476 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_39 0.0481 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_40 0.0482 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_33 0.0486 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_42 0.0493 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.2509 seconds and 0.4189 seconds precompiling for 18 choices

paper_forward fwd+bwd:  112.766 ms
paper_forward bwd-only: 88.924 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB

Autotune Choices Stats:
{"num_choices": 17, "num_triton_choices": 16, "best_kernel": "mm", "best_time": 0.07734400033950806, "best_triton_pos": 1, "best_triton_time": 0.11049599945545197, "best_triton_kernel": "triton_mm_57", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(65536x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.0773 ms 100.0% 
  triton_mm_57 0.1105 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_60 0.1120 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_53 0.1772 ms 43.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_47 0.1776 ms 43.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_54 0.1778 ms 43.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_58 0.1784 ms 43.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_55 0.1789 ms 43.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_49 0.1794 ms 43.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_52 0.1812 ms 42.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.5086 seconds and 0.5048 seconds precompiling for 17 choices
Autotune Choices Stats:
{"num_choices": 17, "num_triton_choices": 16, "best_kernel": "mm", "best_time": 0.13760000467300415, "best_triton_pos": 1, "best_triton_time": 0.2125760018825531, "best_triton_kernel": "triton_mm_73", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(131072x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.1376 ms 100.0% 
  triton_mm_73 0.2126 ms 64.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_76 0.2156 ms 63.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_69 0.3472 ms 39.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_63 0.3472 ms 39.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_70 0.3498 ms 39.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_74 0.3502 ms 39.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_71 0.3514 ms 39.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_65 0.3523 ms 39.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_68 0.3535 ms 38.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.5740 seconds and 0.5289 seconds precompiling for 17 choices
Autotune Choices Stats:
{"num_choices": 6, "num_triton_choices": 0, "best_kernel": "decompose_k_mm_128_split_3", "best_kernel_desc": "k_split=128", "best_time": 0.12777599692344666}
AUTOTUNE mm(512x131072, 131072x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  decompose_k_mm_128_split_3 0.1278 ms 100.0% k_split=128
  decompose_k_mm_256_split_4 0.1321 ms 96.8% k_split=256
  decompose_k_mm_64_split_2 0.1339 ms 95.4% k_split=64
  mm 0.1499 ms 85.3% 
  decompose_k_mm_32_split_1 0.1998 ms 63.9% k_split=32
  decompose_k_mm_16_split_0 0.3077 ms 41.5% k_split=16
SingleProcess AUTOTUNE benchmarking takes 2.9608 seconds and 0.0003 seconds precompiling for 6 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_87", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.09510400146245956, "best_triton_pos": 0}
AUTOTUNE mm(131072x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_87 0.0951 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_83 0.0952 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_85 0.0956 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_90 0.0956 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_89 0.0958 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_86 0.0967 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_88 0.0968 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_80 0.1091 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
  triton_mm_82 0.1139 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_78 0.1140 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
SingleProcess AUTOTUNE benchmarking takes 0.3872 seconds and 0.0002 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_107", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.2122880071401596, "best_triton_pos": 0}
AUTOTUNE mm(327680x1, 1x512)
strides: [1, 0], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_107 0.2123 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_100 0.2124 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_102 0.2124 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_104 0.2124 ms 99.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_106 0.2137 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_103 0.2142 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_105 0.2142 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_99 0.2158 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_101 0.2170 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_111 0.2211 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5931 seconds and 0.2811 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 8, "num_triton_choices": 0, "best_kernel": "decompose_k_mm_64_split_10", "best_kernel_desc": "k_split=64", "best_time": 0.0759039968252182}
AUTOTUNE mm(512x65536, 65536x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  decompose_k_mm_64_split_10 0.0759 ms 100.0% k_split=64
  decompose_k_mm_128_split_11 0.0774 ms 98.1% k_split=128
  mm 0.0918 ms 82.7% 
  decompose_k_mm_32_split_9 0.1099 ms 69.1% k_split=32
  decompose_k_mm_16_split_8 0.1642 ms 46.2% k_split=16
  decompose_k_mm_8_split_7 0.2987 ms 25.4% k_split=8
  decompose_k_mm_4_split_6 0.5745 ms 13.2% k_split=4
  decompose_k_mm_2_split_5 1.1468 ms 6.6% k_split=2
SingleProcess AUTOTUNE benchmarking takes 1.9872 seconds and 0.0003 seconds precompiling for 8 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_117", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.049984000623226166, "best_triton_pos": 0}
AUTOTUNE mm(65536x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_117 0.0500 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_121 0.0502 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_119 0.0505 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_124 0.0505 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_123 0.0506 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_120 0.0507 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_122 0.0509 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  mm 0.0553 ms 90.4% 
  triton_mm_116 0.0569 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_118 0.0588 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.2688 seconds and 0.0002 seconds precompiling for 18 choices

torch_compile_phases_forward fwd+bwd:  48.519 ms
torch_compile_phases_forward bwd-only: 39.164 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016873168060556054, max_abs=0.041015625
production_forward grad[0] vs paper_forward: mean_abs=0.008508625440299511, max_abs=0.3359375, mean_rel=0.07265828549861908, max_rel=129.26316833496094, norm_rel=0.01980203576385975, ref_abs_avg=0.46529603004455566, test_abs_avg=0.4653090238571167
production_forward grad[1] vs paper_forward: mean_abs=5.304624080657959, max_abs=40.0, mean_rel=0.1482349932193756, max_rel=143.6104278564453, norm_rel=0.02075040712952614, ref_abs_avg=230.60047912597656, test_abs_avg=230.626220703125
production_forward grad[2] vs paper_forward: mean_abs=0.8391175270080566, max_abs=3.03125, mean_rel=0.15847139060497284, max_rel=21.842153549194336, norm_rel=0.02222779206931591, ref_abs_avg=37.34083938598633, test_abs_avg=37.42578887939453
production_forward grad[3] vs paper_forward: mean_abs=1.0554871559143066, max_abs=6.5, mean_rel=0.1505568027496338, max_rel=773.34375, norm_rel=0.02288578264415264, ref_abs_avg=46.32328796386719, test_abs_avg=46.32438659667969
production_forward grad[4] vs paper_forward: mean_abs=1.0292997360229492, max_abs=6.5, mean_rel=0.15623582899570465, max_rel=803.5714111328125, norm_rel=0.02255704253911972, ref_abs_avg=45.890220642089844, test_abs_avg=45.89463806152344
production_forward grad[5] vs paper_forward: mean_abs=0.7611246109008789, max_abs=3.5, mean_rel=0.07540443539619446, max_rel=3.9995405673980713, norm_rel=0.022428041324019432, ref_abs_avg=34.629417419433594, test_abs_avg=34.678585052490234
production_forward grad[6] vs paper_forward: mean_abs=0.9252321124076843, max_abs=6.0, mean_rel=0.15741553902626038, max_rel=1661.021240234375, norm_rel=0.022646725177764893, ref_abs_avg=41.04568862915039, test_abs_avg=41.04540252685547
production_forward grad[7] vs paper_forward: mean_abs=0.9039989709854126, max_abs=6.0, mean_rel=0.17569811642169952, max_rel=1973.587890625, norm_rel=0.022367097437381744, ref_abs_avg=40.641990661621094, test_abs_avg=40.64125061035156
production_forward grad[8] vs paper_forward: mean_abs=0.7086551189422607, max_abs=3.25, mean_rel=0.18715927004814148, max_rel=29.06647491455078, norm_rel=0.023703621700406075, ref_abs_avg=29.699066162109375, test_abs_avg=29.681564331054688
production_forward grad[9] vs paper_forward: mean_abs=0.8518524169921875, max_abs=5.75, mean_rel=0.17234501242637634, max_rel=1935.433837890625, norm_rel=0.02248889021575451, ref_abs_avg=38.072540283203125, test_abs_avg=38.077030181884766
production_forward grad[10] vs paper_forward: mean_abs=0.8318246006965637, max_abs=5.0, mean_rel=0.13150593638420105, max_rel=385.8944396972656, norm_rel=0.022228892892599106, ref_abs_avg=37.6262092590332, test_abs_avg=37.625221252441406
production_forward grad[11] vs paper_forward: mean_abs=0.6954784393310547, max_abs=2.5, mean_rel=0.10666052252054214, max_rel=15.037700653076172, norm_rel=0.023270323872566223, ref_abs_avg=30.04815673828125, test_abs_avg=30.083799362182617
production_forward grad[12] vs paper_forward: mean_abs=0.797041654586792, max_abs=5.0, mean_rel=0.15488433837890625, max_rel=1317.93603515625, norm_rel=0.02237618900835514, ref_abs_avg=35.80322265625, test_abs_avg=35.804649353027344
production_forward grad[13] vs paper_forward: mean_abs=0.7765485048294067, max_abs=4.875, mean_rel=0.15517546236515045, max_rel=1034.5, norm_rel=0.022114578634500504, ref_abs_avg=35.310455322265625, test_abs_avg=35.31200408935547
production_forward grad[14] vs paper_forward: mean_abs=0.6162624359130859, max_abs=2.59375, mean_rel=0.14379720389842987, max_rel=16.85886001586914, norm_rel=0.022747822105884552, ref_abs_avg=26.905067443847656, test_abs_avg=26.896244049072266
production_forward grad[15] vs paper_forward: mean_abs=0.7467018365859985, max_abs=4.75, mean_rel=0.15131300687789917, max_rel=1611.2093505859375, norm_rel=0.02219756878912449, ref_abs_avg=33.804931640625, test_abs_avg=33.809261322021484
production_forward grad[16] vs paper_forward: mean_abs=0.7277178764343262, max_abs=4.5, mean_rel=0.16802290081977844, max_rel=1923.41357421875, norm_rel=0.02208220213651657, ref_abs_avg=33.16716384887695, test_abs_avg=33.171451568603516
production_forward grad[17] vs paper_forward: mean_abs=0.5331583023071289, max_abs=2.25, mean_rel=0.08315963298082352, max_rel=11.545424461364746, norm_rel=0.02004508301615715, ref_abs_avg=26.987693786621094, test_abs_avg=26.96642303466797
production_forward grad[18] vs paper_forward: mean_abs=0.6995103359222412, max_abs=4.0, mean_rel=0.14881783723831177, max_rel=1475.660888671875, norm_rel=0.022026153281331062, ref_abs_avg=31.88425064086914, test_abs_avg=31.887035369873047
production_forward grad[19] vs paper_forward: mean_abs=0.6820409297943115, max_abs=4.375, mean_rel=0.1459776610136032, max_rel=791.8738403320312, norm_rel=0.021733449772000313, ref_abs_avg=31.567081451416016, test_abs_avg=31.56690216064453
production_forward grad[20] vs paper_forward: mean_abs=0.5291553735733032, max_abs=2.25, mean_rel=0.08224859088659286, max_rel=6.2138214111328125, norm_rel=0.021755868569016457, ref_abs_avg=24.490779876708984, test_abs_avg=24.456390380859375
production_forward grad[21] vs paper_forward: mean_abs=0.6598186492919922, max_abs=4.21875, mean_rel=0.15475398302078247, max_rel=1085.0052490234375, norm_rel=0.021860307082533836, ref_abs_avg=30.289350509643555, test_abs_avg=30.29127311706543
production_forward grad[22] vs paper_forward: mean_abs=0.6470494270324707, max_abs=4.25, mean_rel=0.1424248218536377, max_rel=764.75634765625, norm_rel=0.021691735833883286, ref_abs_avg=29.978593826293945, test_abs_avg=29.977590560913086
production_forward grad[23] vs paper_forward: mean_abs=0.5245795249938965, max_abs=2.0, mean_rel=0.1808241307735443, max_rel=25.521427154541016, norm_rel=0.022411860525608063, ref_abs_avg=22.9495849609375, test_abs_avg=22.958599090576172
production_forward grad[24] vs paper_forward: mean_abs=0.6341220140457153, max_abs=4.0, mean_rel=0.13503527641296387, max_rel=703.0328369140625, norm_rel=0.02192053571343422, ref_abs_avg=29.071895599365234, test_abs_avg=29.07230567932129
production_forward grad[25] vs paper_forward: mean_abs=0.6170370578765869, max_abs=3.875, mean_rel=0.13149812817573547, max_rel=541.12744140625, norm_rel=0.021532947197556496, ref_abs_avg=28.758041381835938, test_abs_avg=28.75968360900879
production_forward grad[26] vs paper_forward: mean_abs=0.5748848915100098, max_abs=2.46875, mean_rel=0.1380167007446289, max_rel=23.910654067993164, norm_rel=0.02300436608493328, ref_abs_avg=25.437503814697266, test_abs_avg=25.450517654418945
production_forward grad[27] vs paper_forward: mean_abs=0.7397623062133789, max_abs=5.4375, mean_rel=0.15260976552963257, max_rel=731.7556762695312, norm_rel=0.02395494095981121, ref_abs_avg=31.020503997802734, test_abs_avg=31.0189151763916
production_forward grad[28] vs paper_forward: mean_abs=0.7190940380096436, max_abs=4.5, mean_rel=0.1655818372964859, max_rel=1259.9154052734375, norm_rel=0.023687299340963364, ref_abs_avg=30.54116439819336, test_abs_avg=30.545169830322266
production_forward grad[29] vs paper_forward: mean_abs=0.574661374092102, max_abs=2.125, mean_rel=0.922466516494751, max_rel=433.2278747558594, norm_rel=0.02364390715956688, ref_abs_avg=24.395103454589844, test_abs_avg=24.35857391357422
production_forward grad[30] vs paper_forward: mean_abs=0.688830554485321, max_abs=4.375, mean_rel=0.15673035383224487, max_rel=1106.4281005859375, norm_rel=0.024139340966939926, ref_abs_avg=28.613739013671875, test_abs_avg=28.613941192626953
production_forward grad[31] vs paper_forward: mean_abs=0.6779468059539795, max_abs=4.0, mean_rel=0.17272678017616272, max_rel=1503.7767333984375, norm_rel=0.024197611957788467, ref_abs_avg=28.16604232788086, test_abs_avg=28.16439437866211
production_forward grad[32] vs paper_forward: mean_abs=0.5354719161987305, max_abs=1.9375, mean_rel=0.1029617041349411, max_rel=9.623586654663086, norm_rel=0.023898236453533173, ref_abs_avg=22.353845596313477, test_abs_avg=22.30649185180664
production_forward grad[33] vs paper_forward: mean_abs=0.6363619565963745, max_abs=4.0, mean_rel=0.16173896193504333, max_rel=751.43017578125, norm_rel=0.023973021656274796, ref_abs_avg=26.633140563964844, test_abs_avg=26.63418197631836
production_forward grad[34] vs paper_forward: mean_abs=0.6259596943855286, max_abs=3.75, mean_rel=0.1606101542711258, max_rel=1754.9866943359375, norm_rel=0.023830363526940346, ref_abs_avg=26.390573501586914, test_abs_avg=26.395307540893555
production_forward grad[35] vs paper_forward: mean_abs=0.5234905481338501, max_abs=2.125, mean_rel=0.10226403176784515, max_rel=6.170218467712402, norm_rel=0.02460869960486889, ref_abs_avg=21.47699546813965, test_abs_avg=21.439472198486328
production_forward grad[36] vs paper_forward: mean_abs=0.6014806032180786, max_abs=3.859375, mean_rel=0.15660305321216583, max_rel=923.1045532226562, norm_rel=0.02370717190206051, ref_abs_avg=25.4411678314209, test_abs_avg=25.43895721435547
production_forward grad[37] vs paper_forward: mean_abs=0.5843721628189087, max_abs=3.75, mean_rel=0.15998752415180206, max_rel=928.8887329101562, norm_rel=0.0237687099725008, ref_abs_avg=24.652324676513672, test_abs_avg=24.649879455566406
production_forward grad[38] vs paper_forward: mean_abs=0.44517067074775696, max_abs=2.0, mean_rel=0.20530661940574646, max_rel=48.15308380126953, norm_rel=0.024526627734303474, ref_abs_avg=18.75326919555664, test_abs_avg=18.774036407470703
production_forward grad[39] vs paper_forward: mean_abs=0.5654049515724182, max_abs=4.25, mean_rel=0.15726137161254883, max_rel=1661.531005859375, norm_rel=0.023461969569325447, ref_abs_avg=24.157575607299805, test_abs_avg=24.158401489257812
production_forward grad[40] vs paper_forward: mean_abs=0.5608412027359009, max_abs=4.0, mean_rel=0.15294933319091797, max_rel=958.451416015625, norm_rel=0.023654522374272346, ref_abs_avg=23.776531219482422, test_abs_avg=23.76736831665039
production_forward grad[41] vs paper_forward: mean_abs=0.45808887481689453, max_abs=1.75, mean_rel=0.1118929386138916, max_rel=7.4733123779296875, norm_rel=0.023656900972127914, ref_abs_avg=19.705242156982422, test_abs_avg=19.676429748535156
production_forward grad[42] vs paper_forward: mean_abs=0.5420186519622803, max_abs=3.375, mean_rel=0.1532084047794342, max_rel=776.5128784179688, norm_rel=0.023317543789744377, ref_abs_avg=23.268091201782227, test_abs_avg=23.267379760742188
production_forward grad[43] vs paper_forward: mean_abs=0.5264580845832825, max_abs=3.25, mean_rel=0.14732617139816284, max_rel=803.6486206054688, norm_rel=0.02314784936606884, ref_abs_avg=22.796390533447266, test_abs_avg=22.796485900878906
production_forward grad[44] vs paper_forward: mean_abs=0.40175342559814453, max_abs=1.5625, mean_rel=0.1350993812084198, max_rel=9.051862716674805, norm_rel=0.021695857867598534, ref_abs_avg=18.7738037109375, test_abs_avg=18.796865463256836
production_forward grad[45] vs paper_forward: mean_abs=0.5164949893951416, max_abs=3.4375, mean_rel=0.14916154742240906, max_rel=760.5988159179688, norm_rel=0.023249825462698936, ref_abs_avg=22.25717544555664, test_abs_avg=22.2563533782959
production_forward grad[46] vs paper_forward: mean_abs=0.5033577084541321, max_abs=3.0, mean_rel=0.14480432868003845, max_rel=866.3734741210938, norm_rel=0.02295731008052826, ref_abs_avg=21.95943260192871, test_abs_avg=21.959218978881836
production_forward grad[47] vs paper_forward: mean_abs=0.3776921033859253, max_abs=1.5, mean_rel=0.4219937324523926, max_rel=159.4615020751953, norm_rel=0.02181382104754448, ref_abs_avg=17.330324172973633, test_abs_avg=17.339710235595703
production_forward grad[48] vs paper_forward: mean_abs=0.4904592037200928, max_abs=3.75, mean_rel=0.15186260640621185, max_rel=958.679443359375, norm_rel=0.02296961285173893, ref_abs_avg=21.37944793701172, test_abs_avg=21.377723693847656
production_forward grad[49] vs paper_forward: mean_abs=0.48131901025772095, max_abs=3.0, mean_rel=0.1433015763759613, max_rel=1061.0125732421875, norm_rel=0.022908838465809822, ref_abs_avg=21.056673049926758, test_abs_avg=21.05936050415039
production_forward grad[50] vs paper_forward: mean_abs=0.4658176898956299, max_abs=2.125, mean_rel=0.08410674333572388, max_rel=4.486248970031738, norm_rel=0.02620418183505535, ref_abs_avg=18.126155853271484, test_abs_avg=18.111000061035156
production_forward grad[51] vs paper_forward: mean_abs=0.5474926233291626, max_abs=3.8125, mean_rel=0.17423135042190552, max_rel=1429.69873046875, norm_rel=0.024528831243515015, ref_abs_avg=22.386600494384766, test_abs_avg=22.389123916625977
production_forward grad[52] vs paper_forward: mean_abs=0.53779536485672, max_abs=3.75, mean_rel=0.1780482530593872, max_rel=996.8314819335938, norm_rel=0.02443760447204113, ref_abs_avg=22.063663482666016, test_abs_avg=22.06500816345215
production_forward grad[53] vs paper_forward: mean_abs=0.38828158378601074, max_abs=1.6875, mean_rel=0.16752417385578156, max_rel=22.77250099182129, norm_rel=0.02277449704706669, ref_abs_avg=17.388042449951172, test_abs_avg=17.399662017822266
production_forward grad[54] vs paper_forward: mean_abs=0.49810755252838135, max_abs=3.375, mean_rel=0.15448805689811707, max_rel=774.646240234375, norm_rel=0.023967793211340904, ref_abs_avg=20.806182861328125, test_abs_avg=20.806743621826172
production_forward grad[55] vs paper_forward: mean_abs=0.48934051394462585, max_abs=3.96875, mean_rel=0.15505436062812805, max_rel=999.3005981445312, norm_rel=0.0238087996840477, ref_abs_avg=20.584383010864258, test_abs_avg=20.584430694580078
production_forward grad[56] vs paper_forward: mean_abs=0.36173343658447266, max_abs=1.5, mean_rel=0.1299360990524292, max_rel=14.825672149658203, norm_rel=0.022618353366851807, ref_abs_avg=16.238204956054688, test_abs_avg=16.259750366210938
production_forward grad[57] vs paper_forward: mean_abs=0.46042636036872864, max_abs=2.875, mean_rel=0.1432121992111206, max_rel=596.3449096679688, norm_rel=0.023531708866357803, ref_abs_avg=19.573219299316406, test_abs_avg=19.573942184448242
production_forward grad[58] vs paper_forward: mean_abs=0.4543246626853943, max_abs=3.41796875, mean_rel=0.1517183929681778, max_rel=551.34716796875, norm_rel=0.023109745234251022, ref_abs_avg=19.682209014892578, test_abs_avg=19.683242797851562
production_forward grad[59] vs paper_forward: mean_abs=0.36345142126083374, max_abs=1.5, mean_rel=0.6582014560699463, max_rel=122.79462432861328, norm_rel=0.02408355288207531, ref_abs_avg=15.051884651184082, test_abs_avg=15.075098991394043
production_forward grad[60] vs paper_forward: mean_abs=0.43364930152893066, max_abs=3.0, mean_rel=0.14147081971168518, max_rel=576.3591918945312, norm_rel=0.023089226335287094, ref_abs_avg=18.791175842285156, test_abs_avg=18.79184341430664
production_forward grad[61] vs paper_forward: mean_abs=0.42660191655158997, max_abs=2.9375, mean_rel=0.15064261853694916, max_rel=746.0355224609375, norm_rel=0.0224622692912817, ref_abs_avg=18.959678649902344, test_abs_avg=18.962848663330078
production_forward grad[62] vs paper_forward: mean_abs=0.33908432722091675, max_abs=1.3125, mean_rel=0.11611015349626541, max_rel=7.685415267944336, norm_rel=0.0240013487637043, ref_abs_avg=14.244507789611816, test_abs_avg=14.26272201538086
production_forward grad[63] vs paper_forward: mean_abs=0.4098309278488159, max_abs=3.0, mean_rel=0.15080353617668152, max_rel=532.2686157226562, norm_rel=0.022868119180202484, ref_abs_avg=17.914640426635742, test_abs_avg=17.915739059448242
production_forward grad[64] vs paper_forward: mean_abs=0.40543439984321594, max_abs=2.75, mean_rel=0.1427783966064453, max_rel=688.6519165039062, norm_rel=0.022678017616271973, ref_abs_avg=17.875686645507812, test_abs_avg=17.87143325805664
production_forward grad[65] vs paper_forward: mean_abs=0.32575464248657227, max_abs=1.169921875, mean_rel=0.0983891636133194, max_rel=10.32004165649414, norm_rel=0.02192670665681362, ref_abs_avg=14.788248062133789, test_abs_avg=14.779285430908203
production_forward grad[66] vs paper_forward: mean_abs=0.3920862674713135, max_abs=2.875, mean_rel=0.13714653253555298, max_rel=490.4615783691406, norm_rel=0.022616194561123848, ref_abs_avg=17.350460052490234, test_abs_avg=17.351465225219727
production_forward grad[67] vs paper_forward: mean_abs=0.3882255554199219, max_abs=2.75, mean_rel=0.1393449455499649, max_rel=741.1775512695312, norm_rel=0.022229332476854324, ref_abs_avg=17.46544647216797, test_abs_avg=17.47092056274414
production_forward grad[68] vs paper_forward: mean_abs=0.288965106010437, max_abs=1.15625, mean_rel=0.06965339183807373, max_rel=1.3388957977294922, norm_rel=0.02127520926296711, ref_abs_avg=13.741191864013672, test_abs_avg=13.770292282104492
production_forward grad[69] vs paper_forward: mean_abs=0.36839762330055237, max_abs=2.6875, mean_rel=0.13482120633125305, max_rel=746.249755859375, norm_rel=0.021996304392814636, ref_abs_avg=16.732677459716797, test_abs_avg=16.73308753967285
production_forward grad[70] vs paper_forward: mean_abs=0.3612602949142456, max_abs=2.875, mean_rel=0.1335471272468567, max_rel=709.4808349609375, norm_rel=0.021275930106639862, ref_abs_avg=16.944808959960938, test_abs_avg=16.946359634399414
production_forward grad[71] vs paper_forward: mean_abs=0.30230069160461426, max_abs=1.1875, mean_rel=0.14488303661346436, max_rel=22.715694427490234, norm_rel=0.021701598539948463, ref_abs_avg=14.323253631591797, test_abs_avg=14.344186782836914
production_forward grad[72] vs paper_forward: mean_abs=0.36036592721939087, max_abs=2.9375, mean_rel=0.14683854579925537, max_rel=666.8978881835938, norm_rel=0.021611412987113, ref_abs_avg=16.622827529907227, test_abs_avg=16.622587203979492
production_forward grad[73] vs paper_forward: mean_abs=0.3480094075202942, max_abs=2.6875, mean_rel=0.12769639492034912, max_rel=430.02337646484375, norm_rel=0.02136480063199997, ref_abs_avg=16.290050506591797, test_abs_avg=16.287490844726562
production_forward grad[74] vs paper_forward: mean_abs=0.33600836992263794, max_abs=1.421875, mean_rel=0.10575584322214127, max_rel=6.449854850769043, norm_rel=0.022870641201734543, ref_abs_avg=14.494532585144043, test_abs_avg=14.46933364868164
production_forward grad[75] vs paper_forward: mean_abs=0.39997878670692444, max_abs=2.765625, mean_rel=0.1469905525445938, max_rel=859.6769409179688, norm_rel=0.022928478196263313, ref_abs_avg=17.470792770385742, test_abs_avg=17.469213485717773
production_forward grad[76] vs paper_forward: mean_abs=0.38957899808883667, max_abs=2.8125, mean_rel=0.15386056900024414, max_rel=693.27685546875, norm_rel=0.022566650062799454, ref_abs_avg=17.298702239990234, test_abs_avg=17.29587173461914
production_forward grad[77] vs paper_forward: mean_abs=0.3007845878601074, max_abs=1.03125, mean_rel=0.07469229400157928, max_rel=7.309941291809082, norm_rel=0.021470332518219948, ref_abs_avg=13.766035079956055, test_abs_avg=13.75972843170166
production_forward grad[78] vs paper_forward: mean_abs=0.36836373805999756, max_abs=3.5, mean_rel=0.14322903752326965, max_rel=1004.5645751953125, norm_rel=0.022451981902122498, ref_abs_avg=16.374177932739258, test_abs_avg=16.371856689453125
production_forward grad[79] vs paper_forward: mean_abs=0.35619452595710754, max_abs=2.7265625, mean_rel=0.1511325240135193, max_rel=912.2745361328125, norm_rel=0.021835708990693092, ref_abs_avg=16.300376892089844, test_abs_avg=16.300567626953125
production_forward grad[80] vs paper_forward: mean_abs=0.3112049102783203, max_abs=1.1875, mean_rel=0.19278410077095032, max_rel=34.511749267578125, norm_rel=0.022828854620456696, ref_abs_avg=13.549736976623535, test_abs_avg=13.55640697479248
production_forward grad[81] vs paper_forward: mean_abs=0.3421371579170227, max_abs=3.125, mean_rel=0.1382778435945511, max_rel=869.289306640625, norm_rel=0.02151324972510338, ref_abs_avg=15.880409240722656, test_abs_avg=15.878761291503906
production_forward grad[82] vs paper_forward: mean_abs=0.33333620429039, max_abs=3.25, mean_rel=0.13591820001602173, max_rel=296.9539489746094, norm_rel=0.021043449640274048, ref_abs_avg=15.75946044921875, test_abs_avg=15.767324447631836
production_forward grad[83] vs paper_forward: mean_abs=0.2648897171020508, max_abs=0.875, mean_rel=0.10713667422533035, max_rel=14.775432586669922, norm_rel=0.020298846065998077, ref_abs_avg=12.781896591186523, test_abs_avg=12.790979385375977
production_forward grad[84] vs paper_forward: mean_abs=0.31965118646621704, max_abs=3.0, mean_rel=0.1365099847316742, max_rel=601.8687133789062, norm_rel=0.02098040282726288, ref_abs_avg=15.251559257507324, test_abs_avg=15.251084327697754
production_forward grad[85] vs paper_forward: mean_abs=0.31217843294143677, max_abs=2.625, mean_rel=0.1302044689655304, max_rel=582.70947265625, norm_rel=0.021426834166049957, ref_abs_avg=14.715194702148438, test_abs_avg=14.72176742553711
production_forward grad[86] vs paper_forward: mean_abs=0.25240492820739746, max_abs=1.0625, mean_rel=0.07561751455068588, max_rel=5.477896690368652, norm_rel=0.019896266981959343, ref_abs_avg=12.907289505004883, test_abs_avg=12.891115188598633
production_forward grad[87] vs paper_forward: mean_abs=0.30290210247039795, max_abs=3.5, mean_rel=0.12496007978916168, max_rel=566.0729370117188, norm_rel=0.02035733126103878, ref_abs_avg=14.945708274841309, test_abs_avg=14.94530200958252
production_forward grad[88] vs paper_forward: mean_abs=0.29361221194267273, max_abs=3.00390625, mean_rel=0.13262949883937836, max_rel=501.3099670410156, norm_rel=0.02020406164228916, ref_abs_avg=14.591560363769531, test_abs_avg=14.598139762878418
production_forward grad[89] vs paper_forward: mean_abs=0.22228765487670898, max_abs=1.125, mean_rel=0.10157281160354614, max_rel=9.291483879089355, norm_rel=0.019578274339437485, ref_abs_avg=11.82455062866211, test_abs_avg=11.839418411254883
production_forward grad[90] vs paper_forward: mean_abs=0.2873665690422058, max_abs=2.8125, mean_rel=0.12467785179615021, max_rel=595.3994750976562, norm_rel=0.020130695775151253, ref_abs_avg=14.370733261108398, test_abs_avg=14.370705604553223
production_forward grad[91] vs paper_forward: mean_abs=0.2765257954597473, max_abs=2.75, mean_rel=0.12941160798072815, max_rel=656.0404663085938, norm_rel=0.019866405054926872, ref_abs_avg=14.044838905334473, test_abs_avg=14.03913688659668
production_forward grad[92] vs paper_forward: mean_abs=0.22627592086791992, max_abs=0.9609375, mean_rel=0.1399436593055725, max_rel=30.21491813659668, norm_rel=0.019732041284441948, ref_abs_avg=11.655762672424316, test_abs_avg=11.66483211517334
production_forward grad[93] vs paper_forward: mean_abs=0.26734107732772827, max_abs=3.0, mean_rel=0.1157648116350174, max_rel=648.505126953125, norm_rel=0.01976255141198635, ref_abs_avg=13.677364349365234, test_abs_avg=13.676445007324219
production_forward grad[94] vs paper_forward: mean_abs=0.26662394404411316, max_abs=2.5, mean_rel=0.12501010298728943, max_rel=603.9622192382812, norm_rel=0.019550641998648643, ref_abs_avg=13.74953556060791, test_abs_avg=13.747574806213379
production_forward grad[95] vs paper_forward: mean_abs=0.21181821823120117, max_abs=0.84375, mean_rel=0.1327722668647766, max_rel=10.435389518737793, norm_rel=0.0175229050219059, ref_abs_avg=12.372425079345703, test_abs_avg=12.366706848144531
production_forward grad[96] vs paper_forward: mean_abs=0.2586138844490051, max_abs=3.625, mean_rel=0.11980822682380676, max_rel=446.2440490722656, norm_rel=0.019537387415766716, ref_abs_avg=13.46575927734375, test_abs_avg=13.464113235473633
production_forward grad[97] vs paper_forward: mean_abs=0.2545606195926666, max_abs=2.8125, mean_rel=0.11782506108283997, max_rel=403.1797180175781, norm_rel=0.01956063136458397, ref_abs_avg=13.246963500976562, test_abs_avg=13.248273849487305
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016906457021832466, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00885527953505516, max_abs=0.328125, mean_rel=0.0752817690372467, max_rel=135.86611938476562, norm_rel=0.02048237808048725, ref_abs_avg=0.46529603004455566, test_abs_avg=0.46529874205589294
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.36603307723999, max_abs=40.0, mean_rel=0.150414377450943, max_rel=259.7762145996094, norm_rel=0.021074671298265457, ref_abs_avg=230.60047912597656, test_abs_avg=230.61749267578125
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.896026611328125, max_abs=3.25, mean_rel=0.16862735152244568, max_rel=19.955530166625977, norm_rel=0.02406371757388115, ref_abs_avg=37.34083938598633, test_abs_avg=37.380104064941406
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.0914545059204102, max_abs=6.90625, mean_rel=0.1591595709323883, max_rel=994.364013671875, norm_rel=0.023685159161686897, ref_abs_avg=46.32328796386719, test_abs_avg=46.321678161621094
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0676560401916504, max_abs=8.0, mean_rel=0.16027897596359253, max_rel=803.5714111328125, norm_rel=0.023390866816043854, ref_abs_avg=45.890220642089844, test_abs_avg=45.89564514160156
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.8155074119567871, max_abs=3.25, mean_rel=0.07961035519838333, max_rel=3.629114866256714, norm_rel=0.023188132792711258, ref_abs_avg=34.629417419433594, test_abs_avg=34.66446304321289
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9548461437225342, max_abs=6.0, mean_rel=0.16517871618270874, max_rel=921.7247314453125, norm_rel=0.023376459255814552, ref_abs_avg=41.04568862915039, test_abs_avg=41.04267120361328
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9352021217346191, max_abs=5.5625, mean_rel=0.1748138815164566, max_rel=1215.583251953125, norm_rel=0.02311103790998459, ref_abs_avg=40.641990661621094, test_abs_avg=40.64280700683594
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7524454593658447, max_abs=3.0, mean_rel=0.17138859629631042, max_rel=20.250776290893555, norm_rel=0.02546677552163601, ref_abs_avg=29.699066162109375, test_abs_avg=29.655254364013672
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8780537843704224, max_abs=5.5, mean_rel=0.17395181953907013, max_rel=1551.704833984375, norm_rel=0.023179862648248672, ref_abs_avg=38.072540283203125, test_abs_avg=38.07560348510742
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.860387921333313, max_abs=5.5, mean_rel=0.1392633318901062, max_rel=842.5930786132812, norm_rel=0.022976119071245193, ref_abs_avg=37.6262092590332, test_abs_avg=37.624820709228516
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6987323760986328, max_abs=3.0, mean_rel=0.12139184772968292, max_rel=14.921721458435059, norm_rel=0.0239361934363842, ref_abs_avg=30.04815673828125, test_abs_avg=30.07102394104004
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8200623989105225, max_abs=5.5, mean_rel=0.15918323397636414, max_rel=986.5968017578125, norm_rel=0.0230218768119812, ref_abs_avg=35.80322265625, test_abs_avg=35.80461120605469
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7999510765075684, max_abs=5.0, mean_rel=0.15603142976760864, max_rel=1000.9395141601562, norm_rel=0.022775782272219658, ref_abs_avg=35.310455322265625, test_abs_avg=35.311248779296875
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6172400712966919, max_abs=2.75, mean_rel=0.10511336475610733, max_rel=11.93689250946045, norm_rel=0.02288217470049858, ref_abs_avg=26.905067443847656, test_abs_avg=26.904245376586914
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7682832479476929, max_abs=5.0, mean_rel=0.15805894136428833, max_rel=2005.39599609375, norm_rel=0.022836187854409218, ref_abs_avg=33.804931640625, test_abs_avg=33.80820083618164
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7482439279556274, max_abs=4.5, mean_rel=0.17135770618915558, max_rel=1693.453125, norm_rel=0.022705845534801483, ref_abs_avg=33.16716384887695, test_abs_avg=33.17085647583008
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5408420562744141, max_abs=2.25, mean_rel=0.0899401307106018, max_rel=10.086201667785645, norm_rel=0.02044473960995674, ref_abs_avg=26.987693786621094, test_abs_avg=26.999855041503906
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.719237208366394, max_abs=4.25, mean_rel=0.14707762002944946, max_rel=1011.9601440429688, norm_rel=0.022620465606451035, ref_abs_avg=31.88425064086914, test_abs_avg=31.885740280151367
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.7011871337890625, max_abs=4.0, mean_rel=0.1509595513343811, max_rel=749.4652709960938, norm_rel=0.022338377311825752, ref_abs_avg=31.567081451416016, test_abs_avg=31.56562614440918
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5408163070678711, max_abs=2.125, mean_rel=0.07749006897211075, max_rel=3.3858423233032227, norm_rel=0.022140054032206535, ref_abs_avg=24.490779876708984, test_abs_avg=24.461627960205078
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6768870949745178, max_abs=4.0, mean_rel=0.1619335114955902, max_rel=1182.806396484375, norm_rel=0.022430481389164925, ref_abs_avg=30.289350509643555, test_abs_avg=30.290813446044922
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6659262180328369, max_abs=4.0, mean_rel=0.14437955617904663, max_rel=404.6837158203125, norm_rel=0.022309718653559685, ref_abs_avg=29.978593826293945, test_abs_avg=29.97682762145996
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.5302109718322754, max_abs=2.546875, mean_rel=0.27307042479515076, max_rel=71.30927276611328, norm_rel=0.0227394700050354, ref_abs_avg=22.9495849609375, test_abs_avg=22.938398361206055
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6492525339126587, max_abs=4.0, mean_rel=0.1392665058374405, max_rel=836.5458374023438, norm_rel=0.022424913942813873, ref_abs_avg=29.071895599365234, test_abs_avg=29.071840286254883
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6320743560791016, max_abs=3.5859375, mean_rel=0.1378348469734192, max_rel=675.3234252929688, norm_rel=0.022060899063944817, ref_abs_avg=28.758041381835938, test_abs_avg=28.760469436645508
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6013293266296387, max_abs=2.1875, mean_rel=0.12828490138053894, max_rel=26.60903549194336, norm_rel=0.02377457730472088, ref_abs_avg=25.437503814697266, test_abs_avg=25.446739196777344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7565230131149292, max_abs=5.0, mean_rel=0.15757209062576294, max_rel=1229.153564453125, norm_rel=0.024493025615811348, ref_abs_avg=31.020503997802734, test_abs_avg=31.01898193359375
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7353383302688599, max_abs=5.0, mean_rel=0.16306129097938538, max_rel=1523.4320068359375, norm_rel=0.02418985404074192, ref_abs_avg=30.54116439819336, test_abs_avg=30.545177459716797
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5859557390213013, max_abs=2.09765625, mean_rel=0.9356106519699097, max_rel=438.3205871582031, norm_rel=0.02425159513950348, ref_abs_avg=24.395103454589844, test_abs_avg=24.365915298461914
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.7031341195106506, max_abs=4.5, mean_rel=0.15574321150779724, max_rel=971.3302612304688, norm_rel=0.024636315181851387, ref_abs_avg=28.613739013671875, test_abs_avg=28.613723754882812
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.69145268201828, max_abs=4.25, mean_rel=0.16618669033050537, max_rel=1253.099365234375, norm_rel=0.02466766908764839, ref_abs_avg=28.16604232788086, test_abs_avg=28.165983200073242
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5408509373664856, max_abs=2.15625, mean_rel=0.09341958165168762, max_rel=7.81083345413208, norm_rel=0.02460096962749958, ref_abs_avg=22.353845596313477, test_abs_avg=22.31191635131836
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6497212648391724, max_abs=4.06640625, mean_rel=0.16597722470760345, max_rel=1193.414306640625, norm_rel=0.024458395317196846, ref_abs_avg=26.633140563964844, test_abs_avg=26.634082794189453
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6376667618751526, max_abs=4.03125, mean_rel=0.1734980344772339, max_rel=1640.299072265625, norm_rel=0.024244127795100212, ref_abs_avg=26.390573501586914, test_abs_avg=26.393404006958008
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.5290012359619141, max_abs=2.25, mean_rel=0.0959341824054718, max_rel=3.4638712406158447, norm_rel=0.02469932660460472, ref_abs_avg=21.47699546813965, test_abs_avg=21.46120834350586
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.6122727394104004, max_abs=4.0, mean_rel=0.16007977724075317, max_rel=949.237548828125, norm_rel=0.024130064994096756, ref_abs_avg=25.4411678314209, test_abs_avg=25.438541412353516
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5949770212173462, max_abs=3.75, mean_rel=0.16038164496421814, max_rel=818.3767700195312, norm_rel=0.02419937402009964, ref_abs_avg=24.652324676513672, test_abs_avg=24.65176010131836
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4614083468914032, max_abs=1.75, mean_rel=0.13306811451911926, max_rel=15.58370590209961, norm_rel=0.0245419442653656, ref_abs_avg=18.75326919555664, test_abs_avg=18.774188995361328
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5755399465560913, max_abs=4.25, mean_rel=0.16076229512691498, max_rel=1604.9649658203125, norm_rel=0.023890158161520958, ref_abs_avg=24.157575607299805, test_abs_avg=24.15741539001465
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5682473182678223, max_abs=3.5, mean_rel=0.1564478576183319, max_rel=1200.335693359375, norm_rel=0.023969605565071106, ref_abs_avg=23.776531219482422, test_abs_avg=23.769466400146484
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4724235534667969, max_abs=1.75, mean_rel=0.10640304535627365, max_rel=6.87561559677124, norm_rel=0.024416670203208923, ref_abs_avg=19.705242156982422, test_abs_avg=19.66704559326172
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5514206886291504, max_abs=3.75, mean_rel=0.15806908905506134, max_rel=936.522216796875, norm_rel=0.023723255842924118, ref_abs_avg=23.268091201782227, test_abs_avg=23.267250061035156
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5363910794258118, max_abs=3.5, mean_rel=0.1501380056142807, max_rel=1345.8336181640625, norm_rel=0.02355092577636242, ref_abs_avg=22.796390533447266, test_abs_avg=22.798912048339844
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.4104471206665039, max_abs=2.109375, mean_rel=0.10515136271715164, max_rel=4.982991695404053, norm_rel=0.022193461656570435, ref_abs_avg=18.7738037109375, test_abs_avg=18.796405792236328
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5240709185600281, max_abs=3.25, mean_rel=0.15212196111679077, max_rel=852.2342529296875, norm_rel=0.02358067035675049, ref_abs_avg=22.25717544555664, test_abs_avg=22.256324768066406
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.5122572779655457, max_abs=3.25, mean_rel=0.14713354408740997, max_rel=872.8721313476562, norm_rel=0.02334139123558998, ref_abs_avg=21.95943260192871, test_abs_avg=21.959197998046875
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.37565314769744873, max_abs=1.625, mean_rel=0.2440228909254074, max_rel=71.59867858886719, norm_rel=0.021851127967238426, ref_abs_avg=17.330324172973633, test_abs_avg=17.338403701782227
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.4968448877334595, max_abs=3.25, mean_rel=0.14980226755142212, max_rel=665.5003662109375, norm_rel=0.023273322731256485, ref_abs_avg=21.37944793701172, test_abs_avg=21.377479553222656
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.48769474029541016, max_abs=3.0, mean_rel=0.1449030190706253, max_rel=1001.8137817382812, norm_rel=0.023222751915454865, ref_abs_avg=21.056673049926758, test_abs_avg=21.060611724853516
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.46053099632263184, max_abs=1.75, mean_rel=0.16181327402591705, max_rel=42.95709228515625, norm_rel=0.02586880698800087, ref_abs_avg=18.126155853271484, test_abs_avg=18.102436065673828
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5566741228103638, max_abs=3.5625, mean_rel=0.17541034519672394, max_rel=1624.6473388671875, norm_rel=0.024925393983721733, ref_abs_avg=22.386600494384766, test_abs_avg=22.388500213623047
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5465155243873596, max_abs=3.75, mean_rel=0.17938990890979767, max_rel=879.8319702148438, norm_rel=0.024844393134117126, ref_abs_avg=22.063663482666016, test_abs_avg=22.06631851196289
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.404782235622406, max_abs=1.5625, mean_rel=0.13992080092430115, max_rel=21.969799041748047, norm_rel=0.023319266736507416, ref_abs_avg=17.388042449951172, test_abs_avg=17.404611587524414
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.5052783489227295, max_abs=3.25, mean_rel=0.15631911158561707, max_rel=819.1660766601562, norm_rel=0.024310922250151634, ref_abs_avg=20.806182861328125, test_abs_avg=20.806102752685547
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.49577629566192627, max_abs=3.296875, mean_rel=0.15633034706115723, max_rel=932.3192749023438, norm_rel=0.024122584611177444, ref_abs_avg=20.584383010864258, test_abs_avg=20.58246612548828
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.373976469039917, max_abs=1.34375, mean_rel=0.15078282356262207, max_rel=23.542078018188477, norm_rel=0.023081038147211075, ref_abs_avg=16.238204956054688, test_abs_avg=16.253311157226562
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.46698564291000366, max_abs=3.125, mean_rel=0.14681535959243774, max_rel=518.6030883789062, norm_rel=0.023853907361626625, ref_abs_avg=19.573219299316406, test_abs_avg=19.574182510375977
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4619120955467224, max_abs=3.52734375, mean_rel=0.15694007277488708, max_rel=778.8218383789062, norm_rel=0.023477157577872276, ref_abs_avg=19.682209014892578, test_abs_avg=19.683053970336914
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.3603180944919586, max_abs=1.5, mean_rel=0.5749139785766602, max_rel=102.36593627929688, norm_rel=0.02403414621949196, ref_abs_avg=15.051884651184082, test_abs_avg=15.0715913772583
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.43930765986442566, max_abs=3.0, mean_rel=0.14441482722759247, max_rel=790.9147338867188, norm_rel=0.02339828573167324, ref_abs_avg=18.791175842285156, test_abs_avg=18.791780471801758
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4323785901069641, max_abs=3.0, mean_rel=0.1496724933385849, max_rel=640.889892578125, norm_rel=0.022772887721657753, ref_abs_avg=18.959678649902344, test_abs_avg=18.960906982421875
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.33379125595092773, max_abs=1.3125, mean_rel=0.1339784860610962, max_rel=9.891095161437988, norm_rel=0.024034876376390457, ref_abs_avg=14.244507789611816, test_abs_avg=14.267118453979492
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.41397902369499207, max_abs=3.0, mean_rel=0.15327803790569305, max_rel=614.5237426757812, norm_rel=0.023096663877367973, ref_abs_avg=17.914640426635742, test_abs_avg=17.915117263793945
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.40871867537498474, max_abs=2.75, mean_rel=0.143716961145401, max_rel=830.3357543945312, norm_rel=0.022877519950270653, ref_abs_avg=17.875686645507812, test_abs_avg=17.87436866760254
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3262057304382324, max_abs=1.25, mean_rel=0.11097466200590134, max_rel=10.567648887634277, norm_rel=0.022340133786201477, ref_abs_avg=14.788248062133789, test_abs_avg=14.784549713134766
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.3969004452228546, max_abs=3.0, mean_rel=0.13878417015075684, max_rel=591.1635131835938, norm_rel=0.022871941328048706, ref_abs_avg=17.350460052490234, test_abs_avg=17.351234436035156
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.39195334911346436, max_abs=2.75, mean_rel=0.14130666851997375, max_rel=629.7654418945312, norm_rel=0.022431988269090652, ref_abs_avg=17.46544647216797, test_abs_avg=17.472143173217773
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.29200220108032227, max_abs=1.21875, mean_rel=0.0693650022149086, max_rel=1.654390573501587, norm_rel=0.021522032096982002, ref_abs_avg=13.741191864013672, test_abs_avg=13.759382247924805
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.37189608812332153, max_abs=3.0, mean_rel=0.13631579279899597, max_rel=797.9315185546875, norm_rel=0.022191839292645454, ref_abs_avg=16.732677459716797, test_abs_avg=16.733509063720703
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3661918640136719, max_abs=3.0, mean_rel=0.1320517361164093, max_rel=809.1890869140625, norm_rel=0.021578557789325714, ref_abs_avg=16.944808959960938, test_abs_avg=16.944202423095703
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3111453056335449, max_abs=1.125, mean_rel=0.1669769287109375, max_rel=29.04311180114746, norm_rel=0.02202802151441574, ref_abs_avg=14.323253631591797, test_abs_avg=14.350204467773438
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.36339184641838074, max_abs=2.8125, mean_rel=0.1467675119638443, max_rel=812.1641845703125, norm_rel=0.021793369203805923, ref_abs_avg=16.622827529907227, test_abs_avg=16.622879028320312
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3524082899093628, max_abs=3.0, mean_rel=0.12764832377433777, max_rel=543.1080932617188, norm_rel=0.021624956279993057, ref_abs_avg=16.290050506591797, test_abs_avg=16.288555145263672
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.3420374393463135, max_abs=1.71875, mean_rel=0.10343995690345764, max_rel=6.147045135498047, norm_rel=0.02308022789657116, ref_abs_avg=14.494532585144043, test_abs_avg=14.465025901794434
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.40470626950263977, max_abs=3.375, mean_rel=0.14829480648040771, max_rel=919.468505859375, norm_rel=0.02319253422319889, ref_abs_avg=17.470792770385742, test_abs_avg=17.469524383544922
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3964855670928955, max_abs=3.0625, mean_rel=0.15270105004310608, max_rel=866.7623901367188, norm_rel=0.02296055667102337, ref_abs_avg=17.298702239990234, test_abs_avg=17.29721450805664
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.2879842519760132, max_abs=1.4375, mean_rel=0.076778344810009, max_rel=8.427932739257812, norm_rel=0.020972078666090965, ref_abs_avg=13.766035079956055, test_abs_avg=13.753878593444824
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.37199506163597107, max_abs=3.5, mean_rel=0.1428917944431305, max_rel=1039.2008056640625, norm_rel=0.022655699402093887, ref_abs_avg=16.374177932739258, test_abs_avg=16.371856689453125
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.3596058189868927, max_abs=2.6171875, mean_rel=0.15006819367408752, max_rel=527.8436279296875, norm_rel=0.02206142619252205, ref_abs_avg=16.300376892089844, test_abs_avg=16.300395965576172
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3153853416442871, max_abs=1.1875, mean_rel=0.15647441148757935, max_rel=20.821792602539062, norm_rel=0.023065008223056793, ref_abs_avg=13.549736976623535, test_abs_avg=13.560892105102539
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.3454020321369171, max_abs=3.3125, mean_rel=0.1391819417476654, max_rel=803.3630981445312, norm_rel=0.021710488945245743, ref_abs_avg=15.880409240722656, test_abs_avg=15.878966331481934
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.33664053678512573, max_abs=3.078125, mean_rel=0.13864371180534363, max_rel=511.6485290527344, norm_rel=0.02128334902226925, ref_abs_avg=15.75946044921875, test_abs_avg=15.765715599060059
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.27144670486450195, max_abs=1.0, mean_rel=0.11690378189086914, max_rel=18.925064086914062, norm_rel=0.020580528303980827, ref_abs_avg=12.781896591186523, test_abs_avg=12.793657302856445
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.32244354486465454, max_abs=3.0, mean_rel=0.138706237077713, max_rel=657.976318359375, norm_rel=0.02113805152475834, ref_abs_avg=15.251559257507324, test_abs_avg=15.250669479370117
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.31667405366897583, max_abs=2.75, mean_rel=0.13270175457000732, max_rel=596.6161499023438, norm_rel=0.02171844244003296, ref_abs_avg=14.715194702148438, test_abs_avg=14.719158172607422
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.23800110816955566, max_abs=0.8125, mean_rel=0.09819459170103073, max_rel=11.231975555419922, norm_rel=0.01874994859099388, ref_abs_avg=12.907289505004883, test_abs_avg=12.896036148071289
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.30510789155960083, max_abs=3.0, mean_rel=0.12647408246994019, max_rel=479.49835205078125, norm_rel=0.020500626415014267, ref_abs_avg=14.945708274841309, test_abs_avg=14.94482135772705
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.29447638988494873, max_abs=2.5, mean_rel=0.1302911639213562, max_rel=715.1279296875, norm_rel=0.02029114030301571, ref_abs_avg=14.591560363769531, test_abs_avg=14.5992431640625
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.2369251251220703, max_abs=0.96875, mean_rel=0.09200577437877655, max_rel=7.4884796142578125, norm_rel=0.02023736946284771, ref_abs_avg=11.82455062866211, test_abs_avg=11.839390754699707
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2885472774505615, max_abs=2.99151611328125, mean_rel=0.1255936324596405, max_rel=559.6210327148438, norm_rel=0.020199038088321686, ref_abs_avg=14.370733261108398, test_abs_avg=14.370431900024414
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.27952390909194946, max_abs=2.625, mean_rel=0.1299416422843933, max_rel=668.0205688476562, norm_rel=0.02007417194545269, ref_abs_avg=14.044838905334473, test_abs_avg=14.040092468261719
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.23026490211486816, max_abs=0.9375, mean_rel=0.1261984407901764, max_rel=21.14895248413086, norm_rel=0.020391879603266716, ref_abs_avg=11.655762672424316, test_abs_avg=11.666276931762695
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2681407928466797, max_abs=3.125, mean_rel=0.1182025894522667, max_rel=638.7261352539062, norm_rel=0.019818060100078583, ref_abs_avg=13.677364349365234, test_abs_avg=13.676497459411621
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.2659582793712616, max_abs=2.6875, mean_rel=0.12350673228502274, max_rel=442.5213623046875, norm_rel=0.019551681354641914, ref_abs_avg=13.74953556060791, test_abs_avg=13.747611999511719
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.21358847618103027, max_abs=1.0, mean_rel=0.1403294950723648, max_rel=14.947990417480469, norm_rel=0.017844874411821365, ref_abs_avg=12.372425079345703, test_abs_avg=12.361356735229492
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.2588757872581482, max_abs=3.0, mean_rel=0.1191074401140213, max_rel=519.3804931640625, norm_rel=0.019565695896744728, ref_abs_avg=13.46575927734375, test_abs_avg=13.464165687561035
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.25282251834869385, max_abs=2.625, mean_rel=0.11940272897481918, max_rel=818.01025390625, norm_rel=0.019399838522076607, ref_abs_avg=13.246963500976562, test_abs_avg=13.247673034667969
liger_forward vs paper_forward output: mean_abs=0.0001542531535960734, max_abs=0.0234375
liger_forward grad[0] vs paper_forward: mean_abs=0.0035817658063024282, max_abs=0.21875, mean_rel=0.025555070489645004, max_rel=63.94723892211914, norm_rel=0.00960596650838852, ref_abs_avg=0.46529603004455566, test_abs_avg=0.46527403593063354
liger_forward grad[1] vs paper_forward: mean_abs=1.6190613508224487, max_abs=16.0, mean_rel=0.051826488226652145, max_rel=135.52293395996094, norm_rel=0.006750668864697218, ref_abs_avg=230.60047912597656, test_abs_avg=230.60061645507812
liger_forward grad[2] vs paper_forward: mean_abs=0.31679868698120117, max_abs=1.25, mean_rel=0.054285578429698944, max_rel=5.266822814941406, norm_rel=0.008725353516638279, ref_abs_avg=37.34083938598633, test_abs_avg=37.353538513183594
liger_forward grad[3] vs paper_forward: mean_abs=0.39101308584213257, max_abs=2.78125, mean_rel=0.06112046539783478, max_rel=748.0609741210938, norm_rel=0.00876252818852663, ref_abs_avg=46.32328796386719, test_abs_avg=46.32305908203125
liger_forward grad[4] vs paper_forward: mean_abs=0.37566328048706055, max_abs=3.0, mean_rel=0.05924975872039795, max_rel=435.31744384765625, norm_rel=0.008536109700798988, ref_abs_avg=45.890220642089844, test_abs_avg=45.88819122314453
liger_forward grad[5] vs paper_forward: mean_abs=0.2974052429199219, max_abs=1.5, mean_rel=0.03034047782421112, max_rel=2.4112062454223633, norm_rel=0.008896129205822945, ref_abs_avg=34.629417419433594, test_abs_avg=34.61086654663086
liger_forward grad[6] vs paper_forward: mean_abs=0.3359207212924957, max_abs=2.125, mean_rel=0.056455448269844055, max_rel=434.4317321777344, norm_rel=0.008514042012393475, ref_abs_avg=41.04568862915039, test_abs_avg=41.044830322265625
liger_forward grad[7] vs paper_forward: mean_abs=0.32475709915161133, max_abs=2.0, mean_rel=0.05872419476509094, max_rel=545.2622680664062, norm_rel=0.008329594507813454, ref_abs_avg=40.641990661621094, test_abs_avg=40.64369201660156
liger_forward grad[8] vs paper_forward: mean_abs=0.2655789852142334, max_abs=1.0, mean_rel=0.0944104939699173, max_rel=11.706613540649414, norm_rel=0.009107282385230064, ref_abs_avg=29.699066162109375, test_abs_avg=29.686025619506836
liger_forward grad[9] vs paper_forward: mean_abs=0.30702948570251465, max_abs=2.0, mean_rel=0.056129783391952515, max_rel=436.58184814453125, norm_rel=0.008394410833716393, ref_abs_avg=38.072540283203125, test_abs_avg=38.07257080078125
liger_forward grad[10] vs paper_forward: mean_abs=0.2955015301704407, max_abs=2.0, mean_rel=0.049654994159936905, max_rel=295.9302062988281, norm_rel=0.008188611827790737, ref_abs_avg=37.6262092590332, test_abs_avg=37.626522064208984
liger_forward grad[11] vs paper_forward: mean_abs=0.2437305450439453, max_abs=1.0, mean_rel=0.028903890401124954, max_rel=1.1060047149658203, norm_rel=0.00854272861033678, ref_abs_avg=30.04815673828125, test_abs_avg=30.044538497924805
liger_forward grad[12] vs paper_forward: mean_abs=0.28225982189178467, max_abs=2.0, mean_rel=0.05455070734024048, max_rel=480.98175048828125, norm_rel=0.008221595548093319, ref_abs_avg=35.80322265625, test_abs_avg=35.803157806396484
liger_forward grad[13] vs paper_forward: mean_abs=0.2724345028400421, max_abs=2.0, mean_rel=0.053709715604782104, max_rel=786.0368041992188, norm_rel=0.008072346448898315, ref_abs_avg=35.310455322265625, test_abs_avg=35.3109130859375
liger_forward grad[14] vs paper_forward: mean_abs=0.20908069610595703, max_abs=0.8125, mean_rel=0.04941838979721069, max_rel=4.714907646179199, norm_rel=0.008016586303710938, ref_abs_avg=26.905067443847656, test_abs_avg=26.899425506591797
liger_forward grad[15] vs paper_forward: mean_abs=0.2608063220977783, max_abs=2.0, mean_rel=0.05422984063625336, max_rel=317.98162841796875, norm_rel=0.008059334941208363, ref_abs_avg=33.804931640625, test_abs_avg=33.8048210144043
liger_forward grad[16] vs paper_forward: mean_abs=0.25296884775161743, max_abs=2.0, mean_rel=0.05177212879061699, max_rel=381.23394775390625, norm_rel=0.007987622171640396, ref_abs_avg=33.16716384887695, test_abs_avg=33.16813659667969
liger_forward grad[17] vs paper_forward: mean_abs=0.20181751251220703, max_abs=0.875, mean_rel=0.030518535524606705, max_rel=3.4135379791259766, norm_rel=0.008031493984162807, ref_abs_avg=26.987693786621094, test_abs_avg=26.975872039794922
liger_forward grad[18] vs paper_forward: mean_abs=0.2420623004436493, max_abs=1.5703125, mean_rel=0.052082955837249756, max_rel=461.5828552246094, norm_rel=0.007935060188174248, ref_abs_avg=31.88425064086914, test_abs_avg=31.883758544921875
liger_forward grad[19] vs paper_forward: mean_abs=0.23180076479911804, max_abs=1.5, mean_rel=0.052889350801706314, max_rel=510.97711181640625, norm_rel=0.007712278049439192, ref_abs_avg=31.567081451416016, test_abs_avg=31.567276000976562
liger_forward grad[20] vs paper_forward: mean_abs=0.2028818130493164, max_abs=0.75, mean_rel=0.0335114523768425, max_rel=2.2204055786132812, norm_rel=0.008465071208775043, ref_abs_avg=24.490779876708984, test_abs_avg=24.500940322875977
liger_forward grad[21] vs paper_forward: mean_abs=0.22657369077205658, max_abs=1.5, mean_rel=0.052901409566402435, max_rel=312.7025451660156, norm_rel=0.007824232801795006, ref_abs_avg=30.289350509643555, test_abs_avg=30.288822174072266
liger_forward grad[22] vs paper_forward: mean_abs=0.21800154447555542, max_abs=1.5, mean_rel=0.05023355036973953, max_rel=165.1899871826172, norm_rel=0.007634907029569149, ref_abs_avg=29.978593826293945, test_abs_avg=29.978404998779297
liger_forward grad[23] vs paper_forward: mean_abs=0.17093420028686523, max_abs=0.75, mean_rel=0.048039913177490234, max_rel=10.326706886291504, norm_rel=0.0078094517812132835, ref_abs_avg=22.9495849609375, test_abs_avg=22.956315994262695
liger_forward grad[24] vs paper_forward: mean_abs=0.2138185203075409, max_abs=1.5, mean_rel=0.045575790107250214, max_rel=229.02943420410156, norm_rel=0.007721836678683758, ref_abs_avg=29.071895599365234, test_abs_avg=29.0718994140625
liger_forward grad[25] vs paper_forward: mean_abs=0.20488522946834564, max_abs=1.25, mean_rel=0.04528093710541725, max_rel=160.41757202148438, norm_rel=0.007492174860090017, ref_abs_avg=28.758041381835938, test_abs_avg=28.75722885131836
liger_forward grad[26] vs paper_forward: mean_abs=0.19426584243774414, max_abs=0.75, mean_rel=0.04187007248401642, max_rel=5.021986961364746, norm_rel=0.008056402206420898, ref_abs_avg=25.437503814697266, test_abs_avg=25.435016632080078
liger_forward grad[27] vs paper_forward: mean_abs=0.23652535676956177, max_abs=2.0, mean_rel=0.048539020121097565, max_rel=265.7677307128906, norm_rel=0.007979188114404678, ref_abs_avg=31.020503997802734, test_abs_avg=31.019968032836914
liger_forward grad[28] vs paper_forward: mean_abs=0.225724995136261, max_abs=1.5, mean_rel=0.05315864086151123, max_rel=427.48468017578125, norm_rel=0.007756425999104977, ref_abs_avg=30.54116439819336, test_abs_avg=30.54116439819336
liger_forward grad[29] vs paper_forward: mean_abs=0.18748769164085388, max_abs=0.75, mean_rel=0.19374482333660126, max_rel=86.2874984741211, norm_rel=0.008094023913145065, ref_abs_avg=24.395103454589844, test_abs_avg=24.396860122680664
liger_forward grad[30] vs paper_forward: mean_abs=0.21273162961006165, max_abs=1.5, mean_rel=0.04921753704547882, max_rel=339.1495361328125, norm_rel=0.007795398589223623, ref_abs_avg=28.613739013671875, test_abs_avg=28.61382293701172
liger_forward grad[31] vs paper_forward: mean_abs=0.20608775317668915, max_abs=1.5, mean_rel=0.05584953725337982, max_rel=831.530517578125, norm_rel=0.007687567733228207, ref_abs_avg=28.16604232788086, test_abs_avg=28.166034698486328
liger_forward grad[32] vs paper_forward: mean_abs=0.15862751007080078, max_abs=0.75, mean_rel=0.02523692697286606, max_rel=0.8538558483123779, norm_rel=0.00761389872059226, ref_abs_avg=22.353845596313477, test_abs_avg=22.358875274658203
liger_forward grad[33] vs paper_forward: mean_abs=0.19308984279632568, max_abs=1.265625, mean_rel=0.047240279614925385, max_rel=253.83929443359375, norm_rel=0.007612109649926424, ref_abs_avg=26.633140563964844, test_abs_avg=26.632797241210938
liger_forward grad[34] vs paper_forward: mean_abs=0.18579131364822388, max_abs=1.5, mean_rel=0.04918059706687927, max_rel=339.49609375, norm_rel=0.007420673035085201, ref_abs_avg=26.390573501586914, test_abs_avg=26.39071273803711
liger_forward grad[35] vs paper_forward: mean_abs=0.16208460927009583, max_abs=0.75, mean_rel=0.030573803931474686, max_rel=2.234943389892578, norm_rel=0.007760219741612673, ref_abs_avg=21.47699546813965, test_abs_avg=21.464466094970703
liger_forward grad[36] vs paper_forward: mean_abs=0.18008197844028473, max_abs=1.125, mean_rel=0.04961013421416283, max_rel=364.2444152832031, norm_rel=0.007437344174832106, ref_abs_avg=25.4411678314209, test_abs_avg=25.440149307250977
liger_forward grad[37] vs paper_forward: mean_abs=0.1726360321044922, max_abs=1.25, mean_rel=0.0470723882317543, max_rel=315.6854248046875, norm_rel=0.00738219590857625, ref_abs_avg=24.652324676513672, test_abs_avg=24.651042938232422
liger_forward grad[38] vs paper_forward: mean_abs=0.13975265622138977, max_abs=0.625, mean_rel=0.09648396074771881, max_rel=29.94121551513672, norm_rel=0.007874319329857826, ref_abs_avg=18.75326919555664, test_abs_avg=18.756441116333008
liger_forward grad[39] vs paper_forward: mean_abs=0.1685512661933899, max_abs=1.0, mean_rel=0.04611228406429291, max_rel=286.89056396484375, norm_rel=0.007343961391597986, ref_abs_avg=24.157575607299805, test_abs_avg=24.15760040283203
liger_forward grad[40] vs paper_forward: mean_abs=0.16288642585277557, max_abs=1.0, mean_rel=0.04608194902539253, max_rel=330.6021728515625, norm_rel=0.007231818977743387, ref_abs_avg=23.776531219482422, test_abs_avg=23.777252197265625
liger_forward grad[41] vs paper_forward: mean_abs=0.1329820156097412, max_abs=0.6875, mean_rel=0.030399592593312263, max_rel=2.3740503787994385, norm_rel=0.007322358898818493, ref_abs_avg=19.705242156982422, test_abs_avg=19.699066162109375
liger_forward grad[42] vs paper_forward: mean_abs=0.15905949473381042, max_abs=1.0, mean_rel=0.045483991503715515, max_rel=248.1798095703125, norm_rel=0.007201504893600941, ref_abs_avg=23.268091201782227, test_abs_avg=23.26766586303711
liger_forward grad[43] vs paper_forward: mean_abs=0.15252220630645752, max_abs=1.0, mean_rel=0.04399378225207329, max_rel=351.1253662109375, norm_rel=0.007062929682433605, ref_abs_avg=22.796390533447266, test_abs_avg=22.79613494873047
liger_forward grad[44] vs paper_forward: mean_abs=0.12605762481689453, max_abs=0.58984375, mean_rel=0.045371804386377335, max_rel=9.392159461975098, norm_rel=0.007118703331798315, ref_abs_avg=18.7738037109375, test_abs_avg=18.770694732666016
liger_forward grad[45] vs paper_forward: mean_abs=0.15090379118919373, max_abs=1.0, mean_rel=0.04267982393503189, max_rel=166.1916046142578, norm_rel=0.007146766874939203, ref_abs_avg=22.25717544555664, test_abs_avg=22.256732940673828
liger_forward grad[46] vs paper_forward: mean_abs=0.1445571333169937, max_abs=1.0, mean_rel=0.040501438081264496, max_rel=188.19406127929688, norm_rel=0.006967590190470219, ref_abs_avg=21.95943260192871, test_abs_avg=21.959762573242188
liger_forward grad[47] vs paper_forward: mean_abs=0.1130303144454956, max_abs=0.5, mean_rel=0.07155604660511017, max_rel=22.736186981201172, norm_rel=0.006837896537035704, ref_abs_avg=17.330324172973633, test_abs_avg=17.326181411743164
liger_forward grad[48] vs paper_forward: mean_abs=0.14249667525291443, max_abs=1.0, mean_rel=0.04415902495384216, max_rel=444.9710998535156, norm_rel=0.007039212621748447, ref_abs_avg=21.37944793701172, test_abs_avg=21.378772735595703
liger_forward grad[49] vs paper_forward: mean_abs=0.13723237812519073, max_abs=1.0, mean_rel=0.04073994606733322, max_rel=181.2935333251953, norm_rel=0.006898531224578619, ref_abs_avg=21.056673049926758, test_abs_avg=21.056575775146484
liger_forward grad[50] vs paper_forward: mean_abs=0.1356210708618164, max_abs=0.5, mean_rel=0.059439584612846375, max_rel=18.4188232421875, norm_rel=0.007890471257269382, ref_abs_avg=18.126155853271484, test_abs_avg=18.115135192871094
liger_forward grad[51] vs paper_forward: mean_abs=0.16115903854370117, max_abs=1.125, mean_rel=0.04987529665231705, max_rel=460.37109375, norm_rel=0.007557058706879616, ref_abs_avg=22.386600494384766, test_abs_avg=22.387367248535156
liger_forward grad[52] vs paper_forward: mean_abs=0.15570500493049622, max_abs=1.0, mean_rel=0.051404405385255814, max_rel=340.2037658691406, norm_rel=0.007409193087369204, ref_abs_avg=22.063663482666016, test_abs_avg=22.065876007080078
liger_forward grad[53] vs paper_forward: mean_abs=0.1154181957244873, max_abs=0.625, mean_rel=0.03747347742319107, max_rel=5.949195384979248, norm_rel=0.007169072981923819, ref_abs_avg=17.388042449951172, test_abs_avg=17.40048599243164
liger_forward grad[54] vs paper_forward: mean_abs=0.14502233266830444, max_abs=1.0, mean_rel=0.044135503470897675, max_rel=411.145751953125, norm_rel=0.007321903016418219, ref_abs_avg=20.806182861328125, test_abs_avg=20.806724548339844
liger_forward grad[55] vs paper_forward: mean_abs=0.13980883359909058, max_abs=1.0, mean_rel=0.04432646930217743, max_rel=243.31024169921875, norm_rel=0.007162683177739382, ref_abs_avg=20.584383010864258, test_abs_avg=20.584266662597656
liger_forward grad[56] vs paper_forward: mean_abs=0.11550641059875488, max_abs=0.5, mean_rel=0.06546097993850708, max_rel=12.776278495788574, norm_rel=0.007500545587390661, ref_abs_avg=16.238204956054688, test_abs_avg=16.237857818603516
liger_forward grad[57] vs paper_forward: mean_abs=0.1331602931022644, max_abs=1.0, mean_rel=0.04209839925169945, max_rel=188.51695251464844, norm_rel=0.007160348817706108, ref_abs_avg=19.573219299316406, test_abs_avg=19.573150634765625
liger_forward grad[58] vs paper_forward: mean_abs=0.12914134562015533, max_abs=1.0, mean_rel=0.04350355267524719, max_rel=156.27061462402344, norm_rel=0.006934943608939648, ref_abs_avg=19.682209014892578, test_abs_avg=19.681480407714844
liger_forward grad[59] vs paper_forward: mean_abs=0.10923102498054504, max_abs=0.5, mean_rel=0.28634512424468994, max_rel=50.565826416015625, norm_rel=0.007648189552128315, ref_abs_avg=15.051884651184082, test_abs_avg=15.056599617004395
liger_forward grad[60] vs paper_forward: mean_abs=0.12499208748340607, max_abs=1.0, mean_rel=0.04110279306769371, max_rel=334.42620849609375, norm_rel=0.007019022945314646, ref_abs_avg=18.791175842285156, test_abs_avg=18.79137420654297
liger_forward grad[61] vs paper_forward: mean_abs=0.12121927738189697, max_abs=1.0, mean_rel=0.04382972791790962, max_rel=230.45318603515625, norm_rel=0.0067591615952551365, ref_abs_avg=18.959678649902344, test_abs_avg=18.95956802368164
liger_forward grad[62] vs paper_forward: mean_abs=0.09778666496276855, max_abs=0.40625, mean_rel=0.04517325386404991, max_rel=4.038449287414551, norm_rel=0.007417671382427216, ref_abs_avg=14.244507789611816, test_abs_avg=14.247627258300781
liger_forward grad[63] vs paper_forward: mean_abs=0.11743462830781937, max_abs=1.0, mean_rel=0.04224909469485283, max_rel=222.13067626953125, norm_rel=0.006916329730302095, ref_abs_avg=17.914640426635742, test_abs_avg=17.914871215820312
liger_forward grad[64] vs paper_forward: mean_abs=0.11514787375926971, max_abs=1.0, mean_rel=0.04038623347878456, max_rel=182.31455993652344, norm_rel=0.006823481060564518, ref_abs_avg=17.875686645507812, test_abs_avg=17.875225067138672
liger_forward grad[65] vs paper_forward: mean_abs=0.08897542953491211, max_abs=0.5, mean_rel=0.023669838905334473, max_rel=1.3749688863754272, norm_rel=0.006662480998784304, ref_abs_avg=14.788248062133789, test_abs_avg=14.785744667053223
liger_forward grad[66] vs paper_forward: mean_abs=0.11138537526130676, max_abs=1.0, mean_rel=0.03793587535619736, max_rel=149.99301147460938, norm_rel=0.006800897419452667, ref_abs_avg=17.350460052490234, test_abs_avg=17.35049057006836
liger_forward grad[67] vs paper_forward: mean_abs=0.10824167728424072, max_abs=1.0, mean_rel=0.03815170377492905, max_rel=196.8258056640625, norm_rel=0.006596039514988661, ref_abs_avg=17.46544647216797, test_abs_avg=17.465007781982422
liger_forward grad[68] vs paper_forward: mean_abs=0.09087437391281128, max_abs=0.3984375, mean_rel=0.02380022592842579, max_rel=1.0265103578567505, norm_rel=0.007014099508523941, ref_abs_avg=13.741191864013672, test_abs_avg=13.739262580871582
liger_forward grad[69] vs paper_forward: mean_abs=0.10503752529621124, max_abs=1.0, mean_rel=0.03970181941986084, max_rel=296.30181884765625, norm_rel=0.006639812607318163, ref_abs_avg=16.732677459716797, test_abs_avg=16.732925415039062
liger_forward grad[70] vs paper_forward: mean_abs=0.10194900631904602, max_abs=1.0, mean_rel=0.03859834372997284, max_rel=177.71632385253906, norm_rel=0.006411734502762556, ref_abs_avg=16.944808959960938, test_abs_avg=16.945968627929688
liger_forward grad[71] vs paper_forward: mean_abs=0.08198334276676178, max_abs=0.375, mean_rel=0.02421608380973339, max_rel=1.896884560585022, norm_rel=0.006266104057431221, ref_abs_avg=14.323253631591797, test_abs_avg=14.320911407470703
liger_forward grad[72] vs paper_forward: mean_abs=0.10189691185951233, max_abs=1.0, mean_rel=0.040424324572086334, max_rel=252.9633331298828, norm_rel=0.006491741165518761, ref_abs_avg=16.622827529907227, test_abs_avg=16.622652053833008
liger_forward grad[73] vs paper_forward: mean_abs=0.09754709899425507, max_abs=0.8125, mean_rel=0.03474760055541992, max_rel=84.98777770996094, norm_rel=0.006388551089912653, ref_abs_avg=16.290050506591797, test_abs_avg=16.289215087890625
liger_forward grad[74] vs paper_forward: mean_abs=0.09393644332885742, max_abs=0.5, mean_rel=0.031196072697639465, max_rel=2.8237040042877197, norm_rel=0.006879942025989294, ref_abs_avg=14.494532585144043, test_abs_avg=14.490911483764648
liger_forward grad[75] vs paper_forward: mean_abs=0.1179206520318985, max_abs=1.0, mean_rel=0.04313325881958008, max_rel=220.51327514648438, norm_rel=0.007113650906831026, ref_abs_avg=17.470792770385742, test_abs_avg=17.470935821533203
liger_forward grad[76] vs paper_forward: mean_abs=0.11465714126825333, max_abs=1.0, mean_rel=0.04223206639289856, max_rel=191.1934356689453, norm_rel=0.0070184278301894665, ref_abs_avg=17.298702239990234, test_abs_avg=17.30028533935547
liger_forward grad[77] vs paper_forward: mean_abs=0.0950162410736084, max_abs=0.34375, mean_rel=0.02521705999970436, max_rel=1.0427416563034058, norm_rel=0.007066880352795124, ref_abs_avg=13.766035079956055, test_abs_avg=13.772909164428711
liger_forward grad[78] vs paper_forward: mean_abs=0.10666333138942719, max_abs=1.0, mean_rel=0.0425926074385643, max_rel=201.7973175048828, norm_rel=0.006871110759675503, ref_abs_avg=16.374177932739258, test_abs_avg=16.374099731445312
liger_forward grad[79] vs paper_forward: mean_abs=0.10372768342494965, max_abs=1.0, mean_rel=0.04434233158826828, max_rel=269.9354553222656, norm_rel=0.00675865076482296, ref_abs_avg=16.300376892089844, test_abs_avg=16.301891326904297
liger_forward grad[80] vs paper_forward: mean_abs=0.08482867479324341, max_abs=0.375, mean_rel=0.030018014833331108, max_rel=1.7508111000061035, norm_rel=0.006649272982031107, ref_abs_avg=13.549736976623535, test_abs_avg=13.547369003295898
liger_forward grad[81] vs paper_forward: mean_abs=0.09960001707077026, max_abs=1.0, mean_rel=0.0395088754594326, max_rel=273.7018127441406, norm_rel=0.006640693172812462, ref_abs_avg=15.880409240722656, test_abs_avg=15.88068962097168
liger_forward grad[82] vs paper_forward: mean_abs=0.09604472666978836, max_abs=1.0, mean_rel=0.03993602842092514, max_rel=181.00526428222656, norm_rel=0.006485287565737963, ref_abs_avg=15.75946044921875, test_abs_avg=15.760168075561523
liger_forward grad[83] vs paper_forward: mean_abs=0.07811689376831055, max_abs=0.37109375, mean_rel=0.026559650897979736, max_rel=2.3112833499908447, norm_rel=0.006457179319113493, ref_abs_avg=12.781896591186523, test_abs_avg=12.77522087097168
liger_forward grad[84] vs paper_forward: mean_abs=0.09263588488101959, max_abs=1.0, mean_rel=0.038537509739398956, max_rel=270.3133544921875, norm_rel=0.006465908605605364, ref_abs_avg=15.251559257507324, test_abs_avg=15.251555442810059
liger_forward grad[85] vs paper_forward: mean_abs=0.0887513980269432, max_abs=0.890625, mean_rel=0.03607868403196335, max_rel=131.96743774414062, norm_rel=0.00647916691377759, ref_abs_avg=14.715194702148438, test_abs_avg=14.715278625488281
liger_forward grad[86] vs paper_forward: mean_abs=0.07442879676818848, max_abs=0.25390625, mean_rel=0.02872767299413681, max_rel=3.6100893020629883, norm_rel=0.006213290151208639, ref_abs_avg=12.907289505004883, test_abs_avg=12.904766082763672
liger_forward grad[87] vs paper_forward: mean_abs=0.08786247670650482, max_abs=0.8125, mean_rel=0.03615264594554901, max_rel=127.57736206054688, norm_rel=0.006305342074483633, ref_abs_avg=14.945708274841309, test_abs_avg=14.945579528808594
liger_forward grad[88] vs paper_forward: mean_abs=0.08492009341716766, max_abs=1.0, mean_rel=0.03758903592824936, max_rel=125.2853012084961, norm_rel=0.006255062762647867, ref_abs_avg=14.591560363769531, test_abs_avg=14.593042373657227
liger_forward grad[89] vs paper_forward: mean_abs=0.07576853036880493, max_abs=0.375, mean_rel=0.027499210089445114, max_rel=3.036357879638672, norm_rel=0.006880386732518673, ref_abs_avg=11.82455062866211, test_abs_avg=11.83157730102539
liger_forward grad[90] vs paper_forward: mean_abs=0.08293864876031876, max_abs=1.0, mean_rel=0.03528102487325668, max_rel=166.6474609375, norm_rel=0.006213175132870674, ref_abs_avg=14.370733261108398, test_abs_avg=14.370798110961914
liger_forward grad[91] vs paper_forward: mean_abs=0.07951891422271729, max_abs=1.0, mean_rel=0.03570227697491646, max_rel=161.02247619628906, norm_rel=0.006132570095360279, ref_abs_avg=14.044838905334473, test_abs_avg=14.045766830444336
liger_forward grad[92] vs paper_forward: mean_abs=0.0685737133026123, max_abs=0.25, mean_rel=0.033113449811935425, max_rel=5.084695339202881, norm_rel=0.006436641328036785, ref_abs_avg=11.655762672424316, test_abs_avg=11.655202865600586
liger_forward grad[93] vs paper_forward: mean_abs=0.07665282487869263, max_abs=1.0, mean_rel=0.03343123942613602, max_rel=110.0097885131836, norm_rel=0.006080294027924538, ref_abs_avg=13.677364349365234, test_abs_avg=13.677340507507324
liger_forward grad[94] vs paper_forward: mean_abs=0.07609353214502335, max_abs=0.828125, mean_rel=0.03522326052188873, max_rel=106.40776062011719, norm_rel=0.006024767644703388, ref_abs_avg=13.74953556060791, test_abs_avg=13.7468900680542
liger_forward grad[95] vs paper_forward: mean_abs=0.06361961364746094, max_abs=0.25, mean_rel=0.04009030759334564, max_rel=6.607492446899414, norm_rel=0.005718362983316183, ref_abs_avg=12.372425079345703, test_abs_avg=12.372638702392578
liger_forward grad[96] vs paper_forward: mean_abs=0.07401564717292786, max_abs=1.0, mean_rel=0.033536046743392944, max_rel=141.33389282226562, norm_rel=0.006024322006851435, ref_abs_avg=13.46575927734375, test_abs_avg=13.465594291687012
liger_forward grad[97] vs paper_forward: mean_abs=0.07325004041194916, max_abs=1.0, mean_rel=0.03478718921542168, max_rel=142.06068420410156, norm_rel=0.0060774656012654305, ref_abs_avg=13.246963500976562, test_abs_avg=13.24997615814209
identity layers + randn queries
paper_forward fwd+bwd:  112.813 ms
paper_forward bwd-only: 88.964 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
liger_forward fwd+bwd:  46.286 ms
liger_forward bwd-only: 33.884 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
torch_compile_phases_forward fwd+bwd:  48.536 ms
torch_compile_phases_forward bwd-only: 39.379 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB
production_forward fwd+bwd:  33.801 ms
production_forward bwd-only: 28.862 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.238 GiB, fwd+bwd=5.238 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016255807131528854, max_abs=0.03515625
production_forward grad[0] vs paper_forward: mean_abs=0.008314978331327438, max_abs=0.375, mean_rel=0.07176537811756134, max_rel=95.93932342529297, norm_rel=0.01965768076479435, ref_abs_avg=0.45913049578666687, test_abs_avg=0.4591459631919861
production_forward grad[1] vs paper_forward: mean_abs=5.106837749481201, max_abs=48.0, mean_rel=0.1342320591211319, max_rel=108.174072265625, norm_rel=0.020721761509776115, ref_abs_avg=222.1953125, test_abs_avg=222.19699096679688
production_forward grad[2] vs paper_forward: mean_abs=0.9061627388000488, max_abs=3.0, mean_rel=0.12664082646369934, max_rel=17.938732147216797, norm_rel=0.024542812258005142, ref_abs_avg=36.246002197265625, test_abs_avg=36.20183563232422
production_forward grad[3] vs paper_forward: mean_abs=1.0563602447509766, max_abs=7.0, mean_rel=0.1618613451719284, max_rel=1326.6822509765625, norm_rel=0.022890998050570488, ref_abs_avg=46.43198776245117, test_abs_avg=46.43824005126953
production_forward grad[4] vs paper_forward: mean_abs=1.0266211032867432, max_abs=7.0, mean_rel=0.1541244387626648, max_rel=902.8320922851562, norm_rel=0.022500814869999886, ref_abs_avg=45.87211608886719, test_abs_avg=45.87501525878906
production_forward grad[5] vs paper_forward: mean_abs=0.7469949722290039, max_abs=3.4375, mean_rel=0.08882001042366028, max_rel=6.3720173835754395, norm_rel=0.022373689338564873, ref_abs_avg=33.35903549194336, test_abs_avg=33.32238006591797
production_forward grad[6] vs paper_forward: mean_abs=0.9263572692871094, max_abs=5.75, mean_rel=0.1729656159877777, max_rel=2979.39990234375, norm_rel=0.022669067606329918, ref_abs_avg=41.08968734741211, test_abs_avg=41.09800720214844
production_forward grad[7] vs paper_forward: mean_abs=0.9021357893943787, max_abs=5.25, mean_rel=0.15940867364406586, max_rel=1222.3101806640625, norm_rel=0.022376861423254013, ref_abs_avg=40.584659576416016, test_abs_avg=40.59574890136719
production_forward grad[8] vs paper_forward: mean_abs=0.687870979309082, max_abs=2.875, mean_rel=0.1123654693365097, max_rel=22.23308563232422, norm_rel=0.022441441193223, ref_abs_avg=31.33993911743164, test_abs_avg=31.36187171936035
production_forward grad[9] vs paper_forward: mean_abs=0.8474907875061035, max_abs=5.25, mean_rel=0.1480265110731125, max_rel=1047.7486572265625, norm_rel=0.022432930767536163, ref_abs_avg=37.92661666870117, test_abs_avg=37.928916931152344
production_forward grad[10] vs paper_forward: mean_abs=0.8224896192550659, max_abs=5.5078125, mean_rel=0.13474228978157043, max_rel=924.519287109375, norm_rel=0.0223167035728693, ref_abs_avg=37.09149932861328, test_abs_avg=37.09693145751953
production_forward grad[11] vs paper_forward: mean_abs=0.6041204929351807, max_abs=2.3125, mean_rel=0.16462817788124084, max_rel=39.74518585205078, norm_rel=0.020692948251962662, ref_abs_avg=29.890243530273438, test_abs_avg=29.981170654296875
production_forward grad[12] vs paper_forward: mean_abs=0.7770200967788696, max_abs=5.25, mean_rel=0.1562047004699707, max_rel=955.07861328125, norm_rel=0.022340087220072746, ref_abs_avg=34.97532653808594, test_abs_avg=34.977684020996094
production_forward grad[13] vs paper_forward: mean_abs=0.7595647573471069, max_abs=4.5, mean_rel=0.1505417674779892, max_rel=1204.021728515625, norm_rel=0.022051291540265083, ref_abs_avg=34.594703674316406, test_abs_avg=34.59367370605469
production_forward grad[14] vs paper_forward: mean_abs=0.60260009765625, max_abs=2.5625, mean_rel=0.08181130886077881, max_rel=3.365196943283081, norm_rel=0.021973514929413795, ref_abs_avg=27.788984298706055, test_abs_avg=27.80912971496582
production_forward grad[15] vs paper_forward: mean_abs=0.7265122532844543, max_abs=4.4912109375, mean_rel=0.15148326754570007, max_rel=1830.4814453125, norm_rel=0.022137651219964027, ref_abs_avg=32.99018859863281, test_abs_avg=32.99061584472656
production_forward grad[16] vs paper_forward: mean_abs=0.7102341651916504, max_abs=4.0625, mean_rel=0.14195135235786438, max_rel=674.4412841796875, norm_rel=0.021936537697911263, ref_abs_avg=32.55807876586914, test_abs_avg=32.55773162841797
production_forward grad[17] vs paper_forward: mean_abs=0.5813474655151367, max_abs=2.125, mean_rel=0.08127599954605103, max_rel=7.335999488830566, norm_rel=0.022794319316744804, ref_abs_avg=25.95720672607422, test_abs_avg=25.931873321533203
production_forward grad[18] vs paper_forward: mean_abs=0.6865979433059692, max_abs=4.5, mean_rel=0.14706745743751526, max_rel=610.9429321289062, norm_rel=0.02209579385817051, ref_abs_avg=31.229530334472656, test_abs_avg=31.230663299560547
production_forward grad[19] vs paper_forward: mean_abs=0.6674154996871948, max_abs=4.75, mean_rel=0.14037498831748962, max_rel=1512.5234375, norm_rel=0.02182791940867901, ref_abs_avg=30.747360229492188, test_abs_avg=30.749670028686523
production_forward grad[20] vs paper_forward: mean_abs=0.5497565269470215, max_abs=1.875, mean_rel=0.10877269506454468, max_rel=8.563231468200684, norm_rel=0.022680718451738358, ref_abs_avg=24.46982192993164, test_abs_avg=24.441356658935547
production_forward grad[21] vs paper_forward: mean_abs=0.6486942768096924, max_abs=4.0556640625, mean_rel=0.15093031525611877, max_rel=1024.8236083984375, norm_rel=0.021964039653539658, ref_abs_avg=29.63863754272461, test_abs_avg=29.63923454284668
production_forward grad[22] vs paper_forward: mean_abs=0.6325598955154419, max_abs=3.75, mean_rel=0.15421265363693237, max_rel=1520.8836669921875, norm_rel=0.021872157230973244, ref_abs_avg=29.050079345703125, test_abs_avg=29.04729461669922
production_forward grad[23] vs paper_forward: mean_abs=0.5202587842941284, max_abs=2.4375, mean_rel=0.13339772820472717, max_rel=11.169007301330566, norm_rel=0.022797903046011925, ref_abs_avg=23.581113815307617, test_abs_avg=23.611156463623047
production_forward grad[24] vs paper_forward: mean_abs=0.6179236173629761, max_abs=4.125, mean_rel=0.13696737587451935, max_rel=1478.3521728515625, norm_rel=0.021835535764694214, ref_abs_avg=28.44811248779297, test_abs_avg=28.448123931884766
production_forward grad[25] vs paper_forward: mean_abs=0.6014180779457092, max_abs=3.75, mean_rel=0.13422152400016785, max_rel=665.7423706054688, norm_rel=0.021431973204016685, ref_abs_avg=28.21490478515625, test_abs_avg=28.21499252319336
production_forward grad[26] vs paper_forward: mean_abs=0.5960912704467773, max_abs=2.375, mean_rel=0.11146533489227295, max_rel=7.302740573883057, norm_rel=0.023803021758794785, ref_abs_avg=24.908193588256836, test_abs_avg=24.84967803955078
production_forward grad[27] vs paper_forward: mean_abs=0.7182050943374634, max_abs=4.5, mean_rel=0.16473866999149323, max_rel=1495.5467529296875, norm_rel=0.0239922646433115, ref_abs_avg=30.046279907226562, test_abs_avg=30.047348022460938
production_forward grad[28] vs paper_forward: mean_abs=0.6997135281562805, max_abs=4.8125, mean_rel=0.14628866314888, max_rel=749.9661254882812, norm_rel=0.023635882884263992, ref_abs_avg=29.70751953125, test_abs_avg=29.705827713012695
production_forward grad[29] vs paper_forward: mean_abs=0.525821328163147, max_abs=1.78125, mean_rel=0.09161872416734695, max_rel=6.602275848388672, norm_rel=0.022025123238563538, ref_abs_avg=23.704418182373047, test_abs_avg=23.679719924926758
production_forward grad[30] vs paper_forward: mean_abs=0.6760935187339783, max_abs=4.125, mean_rel=0.17056742310523987, max_rel=1331.9766845703125, norm_rel=0.024286318570375443, ref_abs_avg=27.929651260375977, test_abs_avg=27.929607391357422
production_forward grad[31] vs paper_forward: mean_abs=0.6669498085975647, max_abs=4.125, mean_rel=0.1441497802734375, max_rel=462.6152038574219, norm_rel=0.02421058714389801, ref_abs_avg=27.645709991455078, test_abs_avg=27.65091323852539
production_forward grad[32] vs paper_forward: mean_abs=0.49942874908447266, max_abs=2.25, mean_rel=0.06915511190891266, max_rel=2.7492001056671143, norm_rel=0.024061203002929688, ref_abs_avg=20.771821975708008, test_abs_avg=20.76517105102539
production_forward grad[33] vs paper_forward: mean_abs=0.6235552430152893, max_abs=4.0, mean_rel=0.15472039580345154, max_rel=1209.3333740234375, norm_rel=0.024155082181096077, ref_abs_avg=25.878665924072266, test_abs_avg=25.87862777709961
production_forward grad[34] vs paper_forward: mean_abs=0.6137799024581909, max_abs=4.0, mean_rel=0.14672325551509857, max_rel=480.2340087890625, norm_rel=0.024234630167484283, ref_abs_avg=25.426624298095703, test_abs_avg=25.42589569091797
production_forward grad[35] vs paper_forward: mean_abs=0.483994722366333, max_abs=2.375, mean_rel=0.2099849432706833, max_rel=31.008819580078125, norm_rel=0.024936063215136528, ref_abs_avg=19.555301666259766, test_abs_avg=19.591785430908203
production_forward grad[36] vs paper_forward: mean_abs=0.5874672532081604, max_abs=3.53125, mean_rel=0.15883496403694153, max_rel=619.3460693359375, norm_rel=0.023825906217098236, ref_abs_avg=24.695737838745117, test_abs_avg=24.697147369384766
production_forward grad[37] vs paper_forward: mean_abs=0.5743836164474487, max_abs=4.0, mean_rel=0.138548344373703, max_rel=488.59930419921875, norm_rel=0.02363526076078415, ref_abs_avg=24.410198211669922, test_abs_avg=24.411222457885742
production_forward grad[38] vs paper_forward: mean_abs=0.46692657470703125, max_abs=1.6875, mean_rel=0.07754742354154587, max_rel=1.5637352466583252, norm_rel=0.02462044358253479, ref_abs_avg=18.968460083007812, test_abs_avg=18.955097198486328
production_forward grad[39] vs paper_forward: mean_abs=0.5527617931365967, max_abs=3.42578125, mean_rel=0.1589982807636261, max_rel=812.4810791015625, norm_rel=0.023763272911310196, ref_abs_avg=23.310592651367188, test_abs_avg=23.311134338378906
production_forward grad[40] vs paper_forward: mean_abs=0.5454140901565552, max_abs=3.375, mean_rel=0.14425581693649292, max_rel=962.0507202148438, norm_rel=0.023424379527568817, ref_abs_avg=23.32522964477539, test_abs_avg=23.326202392578125
production_forward grad[41] vs paper_forward: mean_abs=0.4331512451171875, max_abs=1.5625, mean_rel=0.08793386816978455, max_rel=4.201146602630615, norm_rel=0.023350592702627182, ref_abs_avg=18.79631996154785, test_abs_avg=18.8236083984375
production_forward grad[42] vs paper_forward: mean_abs=0.5231595635414124, max_abs=3.25, mean_rel=0.15462970733642578, max_rel=1095.479736328125, norm_rel=0.02344774827361107, ref_abs_avg=22.35704803466797, test_abs_avg=22.35834503173828
production_forward grad[43] vs paper_forward: mean_abs=0.516570508480072, max_abs=3.0, mean_rel=0.14784927666187286, max_rel=603.7279663085938, norm_rel=0.023238778114318848, ref_abs_avg=22.242189407348633, test_abs_avg=22.243724822998047
production_forward grad[44] vs paper_forward: mean_abs=0.42826271057128906, max_abs=1.75, mean_rel=0.13965728878974915, max_rel=16.643802642822266, norm_rel=0.02446730062365532, ref_abs_avg=17.98442268371582, test_abs_avg=17.988420486450195
production_forward grad[45] vs paper_forward: mean_abs=0.4989076256752014, max_abs=3.5, mean_rel=0.14696446061134338, max_rel=1255.8045654296875, norm_rel=0.023155217990279198, ref_abs_avg=21.57836151123047, test_abs_avg=21.578086853027344
production_forward grad[46] vs paper_forward: mean_abs=0.4899987578392029, max_abs=3.125, mean_rel=0.15107229351997375, max_rel=734.6392211914062, norm_rel=0.022867711260914803, ref_abs_avg=21.45242691040039, test_abs_avg=21.452774047851562
production_forward grad[47] vs paper_forward: mean_abs=0.38211703300476074, max_abs=1.4375, mean_rel=0.09342728555202484, max_rel=3.6878764629364014, norm_rel=0.022054683417081833, ref_abs_avg=17.740386962890625, test_abs_avg=17.72474479675293
production_forward grad[48] vs paper_forward: mean_abs=0.47528183460235596, max_abs=3.125, mean_rel=0.14985278248786926, max_rel=991.4962158203125, norm_rel=0.022830236703157425, ref_abs_avg=20.8786563873291, test_abs_avg=20.8785343170166
production_forward grad[49] vs paper_forward: mean_abs=0.46865999698638916, max_abs=3.0, mean_rel=0.14677296578884125, max_rel=870.875244140625, norm_rel=0.02258610911667347, ref_abs_avg=20.750621795654297, test_abs_avg=20.750537872314453
production_forward grad[50] vs paper_forward: mean_abs=0.43698549270629883, max_abs=1.75, mean_rel=0.146856427192688, max_rel=10.605006217956543, norm_rel=0.023480257019400597, ref_abs_avg=18.61873435974121, test_abs_avg=18.577911376953125
production_forward grad[51] vs paper_forward: mean_abs=0.5322175621986389, max_abs=3.5, mean_rel=0.15621986985206604, max_rel=868.9288940429688, norm_rel=0.024387510493397713, ref_abs_avg=21.8885440826416, test_abs_avg=21.887466430664062
production_forward grad[52] vs paper_forward: mean_abs=0.5210571885108948, max_abs=3.75, mean_rel=0.16804096102714539, max_rel=784.1707153320312, norm_rel=0.024038562551140785, ref_abs_avg=21.71375274658203, test_abs_avg=21.708242416381836
production_forward grad[53] vs paper_forward: mean_abs=0.38287878036499023, max_abs=1.625, mean_rel=0.08585572242736816, max_rel=5.093002796173096, norm_rel=0.022198794409632683, ref_abs_avg=17.351213455200195, test_abs_avg=17.360498428344727
production_forward grad[54] vs paper_forward: mean_abs=0.4887125492095947, max_abs=3.375, mean_rel=0.15990014374256134, max_rel=987.7212524414062, norm_rel=0.023797351866960526, ref_abs_avg=20.532020568847656, test_abs_avg=20.532678604125977
production_forward grad[55] vs paper_forward: mean_abs=0.47878193855285645, max_abs=3.0, mean_rel=0.155208021402359, max_rel=729.7294921875, norm_rel=0.02390226535499096, ref_abs_avg=20.06279945373535, test_abs_avg=20.060611724853516
production_forward grad[56] vs paper_forward: mean_abs=0.3920440673828125, max_abs=1.46875, mean_rel=0.16315029561519623, max_rel=32.244544982910156, norm_rel=0.022354166954755783, ref_abs_avg=17.514633178710938, test_abs_avg=17.51711654663086
production_forward grad[57] vs paper_forward: mean_abs=0.4524106979370117, max_abs=2.884765625, mean_rel=0.14974720776081085, max_rel=792.6417846679688, norm_rel=0.023393776267766953, ref_abs_avg=19.349082946777344, test_abs_avg=19.348873138427734
production_forward grad[58] vs paper_forward: mean_abs=0.44590330123901367, max_abs=2.796875, mean_rel=0.162073016166687, max_rel=703.6746826171875, norm_rel=0.02373402938246727, ref_abs_avg=18.8068790435791, test_abs_avg=18.81316375732422
production_forward grad[59] vs paper_forward: mean_abs=0.3441281318664551, max_abs=1.5, mean_rel=0.08274644613265991, max_rel=8.089642524719238, norm_rel=0.023332301527261734, ref_abs_avg=14.927916526794434, test_abs_avg=14.928436279296875
production_forward grad[60] vs paper_forward: mean_abs=0.4224926829338074, max_abs=3.0, mean_rel=0.1454905867576599, max_rel=883.1285400390625, norm_rel=0.022732578217983246, ref_abs_avg=18.545475006103516, test_abs_avg=18.546510696411133
production_forward grad[61] vs paper_forward: mean_abs=0.41647499799728394, max_abs=3.046875, mean_rel=0.14802813529968262, max_rel=601.2137451171875, norm_rel=0.02276274375617504, ref_abs_avg=18.251110076904297, test_abs_avg=18.249486923217773
production_forward grad[62] vs paper_forward: mean_abs=0.3223090171813965, max_abs=1.4775390625, mean_rel=0.11748133599758148, max_rel=23.671415328979492, norm_rel=0.022465821355581284, ref_abs_avg=14.672613143920898, test_abs_avg=14.662410736083984
production_forward grad[63] vs paper_forward: mean_abs=0.3951735198497772, max_abs=2.75, mean_rel=0.14593055844306946, max_rel=668.2405395507812, norm_rel=0.022641543298959732, ref_abs_avg=17.46147346496582, test_abs_avg=17.462533950805664
production_forward grad[64] vs paper_forward: mean_abs=0.39015597105026245, max_abs=2.546875, mean_rel=0.15214143693447113, max_rel=659.0939331054688, norm_rel=0.022736510261893272, ref_abs_avg=17.200626373291016, test_abs_avg=17.20130157470703
production_forward grad[65] vs paper_forward: mean_abs=0.328102707862854, max_abs=1.375, mean_rel=0.14756853878498077, max_rel=31.52278709411621, norm_rel=0.022546079009771347, ref_abs_avg=14.308759689331055, test_abs_avg=14.315507888793945
production_forward grad[66] vs paper_forward: mean_abs=0.3767940104007721, max_abs=3.5625, mean_rel=0.14498580992221832, max_rel=488.57061767578125, norm_rel=0.022192763164639473, ref_abs_avg=16.940820693969727, test_abs_avg=16.941638946533203
production_forward grad[67] vs paper_forward: mean_abs=0.36979639530181885, max_abs=2.5, mean_rel=0.13744720816612244, max_rel=403.4490051269531, norm_rel=0.022133976221084595, ref_abs_avg=16.694252014160156, test_abs_avg=16.68833351135254
production_forward grad[68] vs paper_forward: mean_abs=0.289287805557251, max_abs=1.0, mean_rel=0.15380337834358215, max_rel=48.38980484008789, norm_rel=0.02198915369808674, ref_abs_avg=12.962664604187012, test_abs_avg=12.963102340698242
production_forward grad[69] vs paper_forward: mean_abs=0.3558775782585144, max_abs=3.0625, mean_rel=0.1449660360813141, max_rel=610.2265014648438, norm_rel=0.02182622440159321, ref_abs_avg=16.28363800048828, test_abs_avg=16.28423309326172
production_forward grad[70] vs paper_forward: mean_abs=0.3494366407394409, max_abs=2.3125, mean_rel=0.1331339180469513, max_rel=501.0915832519531, norm_rel=0.021447740495204926, ref_abs_avg=16.222196578979492, test_abs_avg=16.217609405517578
production_forward grad[71] vs paper_forward: mean_abs=0.2690104842185974, max_abs=0.90625, mean_rel=0.15555857121944427, max_rel=37.073123931884766, norm_rel=0.020380202680826187, ref_abs_avg=13.232671737670898, test_abs_avg=13.22821044921875
production_forward grad[72] vs paper_forward: mean_abs=0.3397739827632904, max_abs=2.5, mean_rel=0.14083857834339142, max_rel=1028.582275390625, norm_rel=0.02150968462228775, ref_abs_avg=15.785080909729004, test_abs_avg=15.785540580749512
production_forward grad[73] vs paper_forward: mean_abs=0.3340314030647278, max_abs=2.5, mean_rel=0.13780128955841064, max_rel=496.00445556640625, norm_rel=0.02167869359254837, ref_abs_avg=15.407249450683594, test_abs_avg=15.41742992401123
production_forward grad[74] vs paper_forward: mean_abs=0.32690268754959106, max_abs=1.25, mean_rel=0.28001290559768677, max_rel=68.08586883544922, norm_rel=0.02452833764255047, ref_abs_avg=13.125419616699219, test_abs_avg=13.087499618530273
production_forward grad[75] vs paper_forward: mean_abs=0.3745548129081726, max_abs=2.75, mean_rel=0.15017995238304138, max_rel=736.0267944335938, norm_rel=0.023179002106189728, ref_abs_avg=16.16042709350586, test_abs_avg=16.159543991088867
production_forward grad[76] vs paper_forward: mean_abs=0.3689541816711426, max_abs=3.0, mean_rel=0.1468898057937622, max_rel=712.7999267578125, norm_rel=0.022687319666147232, ref_abs_avg=16.284515380859375, test_abs_avg=16.283836364746094
production_forward grad[77] vs paper_forward: mean_abs=0.27754759788513184, max_abs=1.125, mean_rel=0.10003195703029633, max_rel=13.036953926086426, norm_rel=0.021826360374689102, ref_abs_avg=13.126516342163086, test_abs_avg=13.102823257446289
production_forward grad[78] vs paper_forward: mean_abs=0.3438672423362732, max_abs=3.5, mean_rel=0.14513179659843445, max_rel=964.6476440429688, norm_rel=0.022495461627840996, ref_abs_avg=15.263236999511719, test_abs_avg=15.262200355529785
production_forward grad[79] vs paper_forward: mean_abs=0.33535653352737427, max_abs=2.9375, mean_rel=0.1461683213710785, max_rel=629.2782592773438, norm_rel=0.022449839860200882, ref_abs_avg=14.986063003540039, test_abs_avg=14.990062713623047
production_forward grad[80] vs paper_forward: mean_abs=0.2815440893173218, max_abs=1.1875, mean_rel=0.1498296856880188, max_rel=17.531585693359375, norm_rel=0.02318659983575344, ref_abs_avg=11.962303161621094, test_abs_avg=11.974080085754395
production_forward grad[81] vs paper_forward: mean_abs=0.31966882944107056, max_abs=3.9375, mean_rel=0.13728481531143188, max_rel=610.060302734375, norm_rel=0.021913019940257072, ref_abs_avg=14.592573165893555, test_abs_avg=14.59192943572998
production_forward grad[82] vs paper_forward: mean_abs=0.31037014722824097, max_abs=2.75, mean_rel=0.13722214102745056, max_rel=387.2003479003906, norm_rel=0.02144441194832325, ref_abs_avg=14.463117599487305, test_abs_avg=14.461585998535156
production_forward grad[83] vs paper_forward: mean_abs=0.2353365421295166, max_abs=0.875, mean_rel=0.15912100672721863, max_rel=24.354509353637695, norm_rel=0.020093794912099838, ref_abs_avg=11.614078521728516, test_abs_avg=11.634010314941406
production_forward grad[84] vs paper_forward: mean_abs=0.29889556765556335, max_abs=2.75, mean_rel=0.1330055147409439, max_rel=1061.207275390625, norm_rel=0.021243123337626457, ref_abs_avg=14.089521408081055, test_abs_avg=14.0887451171875
production_forward grad[85] vs paper_forward: mean_abs=0.29502081871032715, max_abs=2.9375, mean_rel=0.1310269832611084, max_rel=398.20843505859375, norm_rel=0.02118883840739727, ref_abs_avg=13.928613662719727, test_abs_avg=13.926097869873047
production_forward grad[86] vs paper_forward: mean_abs=0.22164487838745117, max_abs=0.8125, mean_rel=0.12892799079418182, max_rel=8.687725067138672, norm_rel=0.020819835364818573, ref_abs_avg=10.566583633422852, test_abs_avg=10.570867538452148
production_forward grad[87] vs paper_forward: mean_abs=0.2810625433921814, max_abs=2.375, mean_rel=0.1283704936504364, max_rel=486.8863525390625, norm_rel=0.02061714231967926, ref_abs_avg=13.677539825439453, test_abs_avg=13.678547859191895
production_forward grad[88] vs paper_forward: mean_abs=0.27128878235816956, max_abs=2.5, mean_rel=0.11850018054246902, max_rel=343.6331787109375, norm_rel=0.019849106669425964, ref_abs_avg=13.665019989013672, test_abs_avg=13.6657133102417
production_forward grad[89] vs paper_forward: mean_abs=0.21319371461868286, max_abs=1.0, mean_rel=0.16298335790634155, max_rel=45.95429992675781, norm_rel=0.018987666815519333, ref_abs_avg=11.024014472961426, test_abs_avg=11.012019157409668
production_forward grad[90] vs paper_forward: mean_abs=0.2645081579685211, max_abs=3.0, mean_rel=0.12492284923791885, max_rel=662.2184448242188, norm_rel=0.02026466280221939, ref_abs_avg=13.131539344787598, test_abs_avg=13.130294799804688
production_forward grad[91] vs paper_forward: mean_abs=0.2603253722190857, max_abs=2.25, mean_rel=0.127120703458786, max_rel=858.2860107421875, norm_rel=0.020186390727758408, ref_abs_avg=12.9771146774292, test_abs_avg=12.972599029541016
production_forward grad[92] vs paper_forward: mean_abs=0.21064366400241852, max_abs=0.828125, mean_rel=0.14502300322055817, max_rel=36.79355239868164, norm_rel=0.019464129582047462, ref_abs_avg=11.136255264282227, test_abs_avg=11.140893936157227
production_forward grad[93] vs paper_forward: mean_abs=0.2534124255180359, max_abs=2.625, mean_rel=0.12261909991502762, max_rel=493.382568359375, norm_rel=0.01980624534189701, ref_abs_avg=12.935240745544434, test_abs_avg=12.934953689575195
production_forward grad[94] vs paper_forward: mean_abs=0.24710214138031006, max_abs=2.25, mean_rel=0.12113478779792786, max_rel=540.1416625976562, norm_rel=0.019709615036845207, ref_abs_avg=12.665876388549805, test_abs_avg=12.660004615783691
production_forward grad[95] vs paper_forward: mean_abs=0.20241546630859375, max_abs=0.7265625, mean_rel=0.07201413810253143, max_rel=3.580341339111328, norm_rel=0.019132791087031364, ref_abs_avg=10.804096221923828, test_abs_avg=10.800870895385742
production_forward grad[96] vs paper_forward: mean_abs=0.2458028942346573, max_abs=2.5, mean_rel=0.1197921559214592, max_rel=788.67578125, norm_rel=0.019516581669449806, ref_abs_avg=12.79182243347168, test_abs_avg=12.791831016540527
production_forward grad[97] vs paper_forward: mean_abs=0.23458683490753174, max_abs=2.375, mean_rel=0.11748284101486206, max_rel=368.7021484375, norm_rel=0.019673097878694534, ref_abs_avg=12.134767532348633, test_abs_avg=12.133954048156738
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016288021579384804, max_abs=0.03125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008665509521961212, max_abs=0.3515625, mean_rel=0.0744246393442154, max_rel=82.4024887084961, norm_rel=0.020368153229355812, ref_abs_avg=0.45913049578666687, test_abs_avg=0.45912492275238037
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.225346565246582, max_abs=40.0, mean_rel=0.1529492735862732, max_rel=168.16888427734375, norm_rel=0.02116352878510952, ref_abs_avg=222.1953125, test_abs_avg=222.1686553955078
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.9036355018615723, max_abs=3.5, mean_rel=0.17214414477348328, max_rel=39.04715347290039, norm_rel=0.025154227390885353, ref_abs_avg=36.246002197265625, test_abs_avg=36.22720718383789
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.0936092138290405, max_abs=7.0, mean_rel=0.16643807291984558, max_rel=1297.959228515625, norm_rel=0.023685215041041374, ref_abs_avg=46.43198776245117, test_abs_avg=46.433406829833984
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0666446685791016, max_abs=8.0, mean_rel=0.1606713831424713, max_rel=1089.4281005859375, norm_rel=0.023370610550045967, ref_abs_avg=45.87211608886719, test_abs_avg=45.87147903442383
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7651176452636719, max_abs=3.25, mean_rel=0.09398659318685532, max_rel=6.394613265991211, norm_rel=0.02296445518732071, ref_abs_avg=33.35903549194336, test_abs_avg=33.331573486328125
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9586185812950134, max_abs=6.0, mean_rel=0.1798294186592102, max_rel=3065.053955078125, norm_rel=0.023445462808012962, ref_abs_avg=41.08968734741211, test_abs_avg=41.093685150146484
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9392625093460083, max_abs=5.5, mean_rel=0.16870476305484772, max_rel=1620.399658203125, norm_rel=0.023290187120437622, ref_abs_avg=40.584659576416016, test_abs_avg=40.591522216796875
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7177376747131348, max_abs=2.75, mean_rel=0.12126246094703674, max_rel=21.41585350036621, norm_rel=0.02318899892270565, ref_abs_avg=31.33993911743164, test_abs_avg=31.36396026611328
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8756717443466187, max_abs=5.5, mean_rel=0.15436896681785583, max_rel=1408.2939453125, norm_rel=0.023168569430708885, ref_abs_avg=37.92661666870117, test_abs_avg=37.92717742919922
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8508880138397217, max_abs=5.25, mean_rel=0.14218775928020477, max_rel=704.214111328125, norm_rel=0.023068735376000404, ref_abs_avg=37.09149932861328, test_abs_avg=37.09672546386719
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6206152439117432, max_abs=2.5, mean_rel=0.18997111916542053, max_rel=52.49683380126953, norm_rel=0.021086251363158226, ref_abs_avg=29.890243530273438, test_abs_avg=29.977920532226562
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8010244369506836, max_abs=4.75, mean_rel=0.16335353255271912, max_rel=1151.5968017578125, norm_rel=0.02301718480885029, ref_abs_avg=34.97532653808594, test_abs_avg=34.976768493652344
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7846982479095459, max_abs=5.0, mean_rel=0.1514536738395691, max_rel=618.074462890625, norm_rel=0.02277384325861931, ref_abs_avg=34.594703674316406, test_abs_avg=34.58897018432617
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6246471405029297, max_abs=2.515625, mean_rel=0.10511553287506104, max_rel=7.852396488189697, norm_rel=0.022785166278481483, ref_abs_avg=27.788984298706055, test_abs_avg=27.7738094329834
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7493894100189209, max_abs=4.75, mean_rel=0.15800833702087402, max_rel=1610.52880859375, norm_rel=0.022822344675660133, ref_abs_avg=32.99018859863281, test_abs_avg=32.98895263671875
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7341263890266418, max_abs=4.25, mean_rel=0.14809966087341309, max_rel=710.2462158203125, norm_rel=0.022664597257971764, ref_abs_avg=32.55807876586914, test_abs_avg=32.555259704589844
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5914831161499023, max_abs=2.75, mean_rel=0.07223866879940033, max_rel=3.5836541652679443, norm_rel=0.022957807406783104, ref_abs_avg=25.95720672607422, test_abs_avg=25.945632934570312
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.706079363822937, max_abs=4.5, mean_rel=0.1530969738960266, max_rel=879.8319702148438, norm_rel=0.022703006863594055, ref_abs_avg=31.229530334472656, test_abs_avg=31.229984283447266
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6885296106338501, max_abs=4.5, mean_rel=0.14835578203201294, max_rel=1650.6378173828125, norm_rel=0.022512957453727722, ref_abs_avg=30.747360229492188, test_abs_avg=30.747699737548828
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.536613941192627, max_abs=2.0, mean_rel=0.10450846701860428, max_rel=7.360695838928223, norm_rel=0.02218182571232319, ref_abs_avg=24.46982192993164, test_abs_avg=24.431392669677734
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6663264632225037, max_abs=4.375, mean_rel=0.1597365140914917, max_rel=1281.0888671875, norm_rel=0.02253270521759987, ref_abs_avg=29.63863754272461, test_abs_avg=29.636821746826172
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6485264301300049, max_abs=4.0, mean_rel=0.1560068279504776, max_rel=1261.40966796875, norm_rel=0.022394564002752304, ref_abs_avg=29.050079345703125, test_abs_avg=29.04598045349121
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.5143215656280518, max_abs=2.125, mean_rel=0.12270402908325195, max_rel=11.9326810836792, norm_rel=0.022069532424211502, ref_abs_avg=23.581113815307617, test_abs_avg=23.61888885498047
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.633642315864563, max_abs=4.125, mean_rel=0.1404358595609665, max_rel=1250.91796875, norm_rel=0.022375650703907013, ref_abs_avg=28.44811248779297, test_abs_avg=28.44701385498047
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.618798017501831, max_abs=4.0, mean_rel=0.1399715542793274, max_rel=764.463134765625, norm_rel=0.022053180262446404, ref_abs_avg=28.21490478515625, test_abs_avg=28.21263885498047
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6431798338890076, max_abs=2.5625, mean_rel=0.10758611559867859, max_rel=4.345571041107178, norm_rel=0.025885391980409622, ref_abs_avg=24.908193588256836, test_abs_avg=24.87198257446289
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7365894317626953, max_abs=5.0, mean_rel=0.16789481043815613, max_rel=1768.49951171875, norm_rel=0.02459145337343216, ref_abs_avg=30.046279907226562, test_abs_avg=30.04724884033203
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7184370756149292, max_abs=4.4375, mean_rel=0.1538984477519989, max_rel=1557.7105712890625, norm_rel=0.024279698729515076, ref_abs_avg=29.70751953125, test_abs_avg=29.705886840820312
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5339908599853516, max_abs=2.125, mean_rel=0.08983244001865387, max_rel=7.385684013366699, norm_rel=0.022563757374882698, ref_abs_avg=23.704418182373047, test_abs_avg=23.70693588256836
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6921981573104858, max_abs=4.18359375, mean_rel=0.17246770858764648, max_rel=1313.9456787109375, norm_rel=0.0248470026999712, ref_abs_avg=27.929651260375977, test_abs_avg=27.92827606201172
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6810720562934875, max_abs=4.0, mean_rel=0.15422901511192322, max_rel=586.922119140625, norm_rel=0.024729235097765923, ref_abs_avg=27.645709991455078, test_abs_avg=27.650283813476562
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5025386810302734, max_abs=2.0, mean_rel=0.07205716520547867, max_rel=2.101138114929199, norm_rel=0.024245597422122955, ref_abs_avg=20.771821975708008, test_abs_avg=20.758365631103516
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6369996666908264, max_abs=4.2392578125, mean_rel=0.15878747403621674, max_rel=1360.480224609375, norm_rel=0.024670686572790146, ref_abs_avg=25.878665924072266, test_abs_avg=25.87789535522461
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6265453100204468, max_abs=4.0, mean_rel=0.1515839695930481, max_rel=526.8494262695312, norm_rel=0.024713465943932533, ref_abs_avg=25.426624298095703, test_abs_avg=25.424591064453125
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.4925704002380371, max_abs=1.75, mean_rel=0.18593505024909973, max_rel=24.669239044189453, norm_rel=0.025218917056918144, ref_abs_avg=19.555301666259766, test_abs_avg=19.592254638671875
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.6001186966896057, max_abs=4.0, mean_rel=0.16298037767410278, max_rel=885.9730224609375, norm_rel=0.02432873472571373, ref_abs_avg=24.695737838745117, test_abs_avg=24.695104598999023
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5852289795875549, max_abs=3.75, mean_rel=0.1421022117137909, max_rel=386.2265930175781, norm_rel=0.02406170405447483, ref_abs_avg=24.410198211669922, test_abs_avg=24.411209106445312
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4684721529483795, max_abs=1.75, mean_rel=0.08032196760177612, max_rel=2.679858684539795, norm_rel=0.024529805406928062, ref_abs_avg=18.968460083007812, test_abs_avg=18.93511962890625
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.563518226146698, max_abs=3.34765625, mean_rel=0.1631757616996765, max_rel=932.9152221679688, norm_rel=0.024230698123574257, ref_abs_avg=23.310592651367188, test_abs_avg=23.3104190826416
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5542513132095337, max_abs=3.5, mean_rel=0.14706234633922577, max_rel=1239.3895263671875, norm_rel=0.023793291300535202, ref_abs_avg=23.32522964477539, test_abs_avg=23.323219299316406
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4309368133544922, max_abs=1.625, mean_rel=0.08939415216445923, max_rel=4.17485237121582, norm_rel=0.023485172539949417, ref_abs_avg=18.79631996154785, test_abs_avg=18.82575798034668
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5328099727630615, max_abs=3.75, mean_rel=0.15766137838363647, max_rel=1088.2720947265625, norm_rel=0.023866629227995872, ref_abs_avg=22.35704803466797, test_abs_avg=22.357053756713867
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5261144638061523, max_abs=3.27734375, mean_rel=0.14897826313972473, max_rel=482.45703125, norm_rel=0.02368994429707527, ref_abs_avg=22.242189407348633, test_abs_avg=22.241670608520508
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.425992488861084, max_abs=2.046875, mean_rel=0.12158438563346863, max_rel=12.137961387634277, norm_rel=0.024248316884040833, ref_abs_avg=17.98442268371582, test_abs_avg=17.982568740844727
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5072023868560791, max_abs=3.5, mean_rel=0.15312501788139343, max_rel=1115.7767333984375, norm_rel=0.023523932322859764, ref_abs_avg=21.57836151123047, test_abs_avg=21.578319549560547
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.49903520941734314, max_abs=3.5, mean_rel=0.15364277362823486, max_rel=651.4721069335938, norm_rel=0.023285912349820137, ref_abs_avg=21.45242691040039, test_abs_avg=21.452547073364258
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.38466405868530273, max_abs=1.703125, mean_rel=0.08801572024822235, max_rel=3.767021417617798, norm_rel=0.02206108160316944, ref_abs_avg=17.740386962890625, test_abs_avg=17.728490829467773
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.4825146198272705, max_abs=3.1953125, mean_rel=0.15266528725624084, max_rel=854.4418334960938, norm_rel=0.023160843178629875, ref_abs_avg=20.8786563873291, test_abs_avg=20.87822723388672
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.476482093334198, max_abs=3.5, mean_rel=0.1453137993812561, max_rel=827.5175170898438, norm_rel=0.02293924055993557, ref_abs_avg=20.750621795654297, test_abs_avg=20.7496337890625
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.44398927688598633, max_abs=1.5625, mean_rel=0.14829841256141663, max_rel=16.115360260009766, norm_rel=0.024047493934631348, ref_abs_avg=18.61873435974121, test_abs_avg=18.59046173095703
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5404070615768433, max_abs=3.875, mean_rel=0.15720373392105103, max_rel=756.5482788085938, norm_rel=0.024759339168667793, ref_abs_avg=21.8885440826416, test_abs_avg=21.887332916259766
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5272899270057678, max_abs=3.6875, mean_rel=0.16080687940120697, max_rel=679.8439331054688, norm_rel=0.024317296221852303, ref_abs_avg=21.71375274658203, test_abs_avg=21.710594177246094
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.39804935455322266, max_abs=1.625, mean_rel=0.0835837796330452, max_rel=3.2246458530426025, norm_rel=0.02275840938091278, ref_abs_avg=17.351213455200195, test_abs_avg=17.368980407714844
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.4953809976577759, max_abs=3.375, mean_rel=0.16294452548027039, max_rel=1046.5823974609375, norm_rel=0.024112051352858543, ref_abs_avg=20.532020568847656, test_abs_avg=20.531396865844727
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.48439255356788635, max_abs=3.21875, mean_rel=0.15835076570510864, max_rel=518.4627685546875, norm_rel=0.024164360016584396, ref_abs_avg=20.06279945373535, test_abs_avg=20.060073852539062
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.39131879806518555, max_abs=1.40625, mean_rel=0.17001061141490936, max_rel=28.737041473388672, norm_rel=0.02231345698237419, ref_abs_avg=17.514633178710938, test_abs_avg=17.523075103759766
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.4585009813308716, max_abs=3.0, mean_rel=0.15246891975402832, max_rel=683.5401000976562, norm_rel=0.023697534576058388, ref_abs_avg=19.349082946777344, test_abs_avg=19.348848342895508
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4528971314430237, max_abs=3.0, mean_rel=0.15979453921318054, max_rel=700.6008911132812, norm_rel=0.024113686755299568, ref_abs_avg=18.8068790435791, test_abs_avg=18.812702178955078
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.3619260787963867, max_abs=1.5, mean_rel=0.09113234281539917, max_rel=10.29758071899414, norm_rel=0.024738486856222153, ref_abs_avg=14.927916526794434, test_abs_avg=14.924314498901367
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.42889347672462463, max_abs=3.0, mean_rel=0.15115953981876373, max_rel=865.9164428710938, norm_rel=0.023057302460074425, ref_abs_avg=18.545475006103516, test_abs_avg=18.546560287475586
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.42137449979782104, max_abs=2.578125, mean_rel=0.15411874651908875, max_rel=861.1272583007812, norm_rel=0.02303461544215679, ref_abs_avg=18.251110076904297, test_abs_avg=18.248348236083984
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3522762656211853, max_abs=1.25, mean_rel=0.1275801956653595, max_rel=22.13102912902832, norm_rel=0.023749008774757385, ref_abs_avg=14.672613143920898, test_abs_avg=14.657326698303223
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.40050822496414185, max_abs=2.8125, mean_rel=0.1488265097141266, max_rel=782.8538818359375, norm_rel=0.022934801876544952, ref_abs_avg=17.46147346496582, test_abs_avg=17.462419509887695
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.3943043351173401, max_abs=2.5, mean_rel=0.1546608954668045, max_rel=514.8006591796875, norm_rel=0.022968342527747154, ref_abs_avg=17.200626373291016, test_abs_avg=17.20123291015625
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3242541551589966, max_abs=1.25, mean_rel=0.17016778886318207, max_rel=41.16237258911133, norm_rel=0.02230384200811386, ref_abs_avg=14.308759689331055, test_abs_avg=14.318808555603027
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.381157249212265, max_abs=3.875, mean_rel=0.14644944667816162, max_rel=629.4237670898438, norm_rel=0.022444451227784157, ref_abs_avg=16.940820693969727, test_abs_avg=16.94194221496582
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3731880187988281, max_abs=2.5, mean_rel=0.13724097609519958, max_rel=473.80902099609375, norm_rel=0.02235659584403038, ref_abs_avg=16.694252014160156, test_abs_avg=16.687877655029297
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3016691505908966, max_abs=1.0625, mean_rel=0.1960071325302124, max_rel=57.34569549560547, norm_rel=0.022659458220005035, ref_abs_avg=12.962664604187012, test_abs_avg=12.983415603637695
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.3600730299949646, max_abs=3.0, mean_rel=0.15034253895282745, max_rel=662.37646484375, norm_rel=0.022062605246901512, ref_abs_avg=16.28363800048828, test_abs_avg=16.283720016479492
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.35489559173583984, max_abs=2.5625, mean_rel=0.13557365536689758, max_rel=429.37701416015625, norm_rel=0.021785344928503036, ref_abs_avg=16.222196578979492, test_abs_avg=16.21573257446289
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.2687423825263977, max_abs=1.25, mean_rel=0.16337929666042328, max_rel=51.19826889038086, norm_rel=0.02020098827779293, ref_abs_avg=13.232671737670898, test_abs_avg=13.224251747131348
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3426416218280792, max_abs=3.0, mean_rel=0.1445673704147339, max_rel=976.0250244140625, norm_rel=0.021689584478735924, ref_abs_avg=15.785080909729004, test_abs_avg=15.785600662231445
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3342427611351013, max_abs=3.0, mean_rel=0.13650831580162048, max_rel=431.39373779296875, norm_rel=0.021673614159226418, ref_abs_avg=15.407249450683594, test_abs_avg=15.416557312011719
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.33640801906585693, max_abs=1.25, mean_rel=0.2873312830924988, max_rel=58.488670349121094, norm_rel=0.025395570322871208, ref_abs_avg=13.125419616699219, test_abs_avg=13.083277702331543
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.3794299364089966, max_abs=2.875, mean_rel=0.1533922553062439, max_rel=660.954345703125, norm_rel=0.02346883714199066, ref_abs_avg=16.16042709350586, test_abs_avg=16.158597946166992
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3741101026535034, max_abs=3.0, mean_rel=0.1476304531097412, max_rel=594.0184936523438, norm_rel=0.023016300052404404, ref_abs_avg=16.284515380859375, test_abs_avg=16.28057861328125
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.2950165271759033, max_abs=1.375, mean_rel=0.12826505303382874, max_rel=19.25798797607422, norm_rel=0.023081041872501373, ref_abs_avg=13.126516342163086, test_abs_avg=13.111886978149414
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3478562831878662, max_abs=3.0, mean_rel=0.1478087306022644, max_rel=891.8609008789062, norm_rel=0.022745799273252487, ref_abs_avg=15.263236999511719, test_abs_avg=15.2612886428833
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.33921533823013306, max_abs=2.71875, mean_rel=0.14053557813167572, max_rel=291.0978088378906, norm_rel=0.02268119528889656, ref_abs_avg=14.986063003540039, test_abs_avg=14.987770080566406
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.2695465683937073, max_abs=1.1875, mean_rel=0.16865572333335876, max_rel=20.969934463500977, norm_rel=0.02239706553518772, ref_abs_avg=11.962303161621094, test_abs_avg=11.971221923828125
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.32275187969207764, max_abs=4.0, mean_rel=0.14025238156318665, max_rel=784.739501953125, norm_rel=0.022124554961919785, ref_abs_avg=14.592573165893555, test_abs_avg=14.59144401550293
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.31153151392936707, max_abs=2.75, mean_rel=0.13625164330005646, max_rel=657.3395385742188, norm_rel=0.021565843373537064, ref_abs_avg=14.463117599487305, test_abs_avg=14.460853576660156
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.23987340927124023, max_abs=0.8125, mean_rel=0.14383374154567719, max_rel=16.45574951171875, norm_rel=0.02032284438610077, ref_abs_avg=11.614078521728516, test_abs_avg=11.627706527709961
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.30198606848716736, max_abs=2.625, mean_rel=0.1341831386089325, max_rel=968.0003662109375, norm_rel=0.02145436592400074, ref_abs_avg=14.089521408081055, test_abs_avg=14.088239669799805
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.29670965671539307, max_abs=2.875, mean_rel=0.1295868456363678, max_rel=315.35107421875, norm_rel=0.021302860230207443, ref_abs_avg=13.928613662719727, test_abs_avg=13.928203582763672
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.2297622561454773, max_abs=0.875, mean_rel=0.14506351947784424, max_rel=9.478517532348633, norm_rel=0.022060422226786613, ref_abs_avg=10.566583633422852, test_abs_avg=10.571165084838867
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.28315240144729614, max_abs=2.5, mean_rel=0.13064971566200256, max_rel=605.2835083007812, norm_rel=0.0207692701369524, ref_abs_avg=13.677539825439453, test_abs_avg=13.677821159362793
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.2760084867477417, max_abs=2.5, mean_rel=0.12133121490478516, max_rel=367.8465881347656, norm_rel=0.02018379233777523, ref_abs_avg=13.665019989013672, test_abs_avg=13.665857315063477
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.20634609460830688, max_abs=0.9375, mean_rel=0.18986335396766663, max_rel=62.561161041259766, norm_rel=0.01858932338654995, ref_abs_avg=11.024014472961426, test_abs_avg=11.007190704345703
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2659386396408081, max_abs=3.0, mean_rel=0.12652263045310974, max_rel=713.39794921875, norm_rel=0.020354973152279854, ref_abs_avg=13.131539344787598, test_abs_avg=13.13031005859375
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.2603699266910553, max_abs=2.3125, mean_rel=0.1282871663570404, max_rel=447.435546875, norm_rel=0.020145785063505173, ref_abs_avg=12.9771146774292, test_abs_avg=12.973457336425781
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.21041040122509003, max_abs=0.859375, mean_rel=0.16142547130584717, max_rel=26.463685989379883, norm_rel=0.019109655171632767, ref_abs_avg=11.136255264282227, test_abs_avg=11.141585350036621
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.25438013672828674, max_abs=2.75, mean_rel=0.12212958931922913, max_rel=566.4351196289062, norm_rel=0.019868500530719757, ref_abs_avg=12.935240745544434, test_abs_avg=12.934672355651855
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.24683931469917297, max_abs=2.25, mean_rel=0.118997722864151, max_rel=540.1416625976562, norm_rel=0.019670261070132256, ref_abs_avg=12.665876388549805, test_abs_avg=12.658304214477539
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.20455646514892578, max_abs=0.75, mean_rel=0.08201061189174652, max_rel=7.8625712394714355, norm_rel=0.019092725589871407, ref_abs_avg=10.804096221923828, test_abs_avg=10.80197811126709
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.24658271670341492, max_abs=2.375, mean_rel=0.12003020942211151, max_rel=579.2645263671875, norm_rel=0.0195767842233181, ref_abs_avg=12.79182243347168, test_abs_avg=12.791495323181152
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.2355390191078186, max_abs=2.25, mean_rel=0.11952720582485199, max_rel=432.2713928222656, norm_rel=0.01978929527103901, ref_abs_avg=12.134767532348633, test_abs_avg=12.13308048248291
liger_forward vs paper_forward output: mean_abs=0.0001480155042372644, max_abs=0.03125
liger_forward grad[0] vs paper_forward: mean_abs=0.003523176070302725, max_abs=0.21875, mean_rel=0.025398002937436104, max_rel=41.57236099243164, norm_rel=0.00957316905260086, ref_abs_avg=0.45913049578666687, test_abs_avg=0.459109902381897
liger_forward grad[1] vs paper_forward: mean_abs=1.5353738069534302, max_abs=16.0, mean_rel=0.041761789470911026, max_rel=25.70049285888672, norm_rel=0.0066374544985592365, ref_abs_avg=222.1953125, test_abs_avg=222.1721954345703
liger_forward grad[2] vs paper_forward: mean_abs=0.33610010147094727, max_abs=1.125, mean_rel=0.07111100852489471, max_rel=18.030109405517578, norm_rel=0.009477945975959301, ref_abs_avg=36.246002197265625, test_abs_avg=36.22927474975586
liger_forward grad[3] vs paper_forward: mean_abs=0.38962429761886597, max_abs=3.0, mean_rel=0.06075138598680496, max_rel=509.56719970703125, norm_rel=0.008721589110791683, ref_abs_avg=46.43198776245117, test_abs_avg=46.431488037109375
liger_forward grad[4] vs paper_forward: mean_abs=0.3775457441806793, max_abs=3.0, mean_rel=0.05929764360189438, max_rel=659.310791015625, norm_rel=0.008561016991734505, ref_abs_avg=45.87211608886719, test_abs_avg=45.87177276611328
liger_forward grad[5] vs paper_forward: mean_abs=0.2663002014160156, max_abs=1.25, mean_rel=0.029723655432462692, max_rel=1.151253342628479, norm_rel=0.008308392018079758, ref_abs_avg=33.35903549194336, test_abs_avg=33.386474609375
liger_forward grad[6] vs paper_forward: mean_abs=0.33586615324020386, max_abs=2.140625, mean_rel=0.05951879918575287, max_rel=673.9564819335938, norm_rel=0.008507044985890388, ref_abs_avg=41.08968734741211, test_abs_avg=41.08990478515625
liger_forward grad[7] vs paper_forward: mean_abs=0.32219499349594116, max_abs=2.0, mean_rel=0.056595463305711746, max_rel=568.305908203125, norm_rel=0.008289088495075703, ref_abs_avg=40.584659576416016, test_abs_avg=40.58373260498047
liger_forward grad[8] vs paper_forward: mean_abs=0.24030017852783203, max_abs=0.96875, mean_rel=0.028180137276649475, max_rel=1.9811604022979736, norm_rel=0.007932709529995918, ref_abs_avg=31.33993911743164, test_abs_avg=31.328218460083008
liger_forward grad[9] vs paper_forward: mean_abs=0.3031723201274872, max_abs=2.0, mean_rel=0.05533631145954132, max_rel=600.6765747070312, norm_rel=0.008321603760123253, ref_abs_avg=37.92661666870117, test_abs_avg=37.927284240722656
liger_forward grad[10] vs paper_forward: mean_abs=0.29206907749176025, max_abs=2.0, mean_rel=0.04853919893503189, max_rel=325.3966979980469, norm_rel=0.008222554810345173, ref_abs_avg=37.09149932861328, test_abs_avg=37.09184646606445
liger_forward grad[11] vs paper_forward: mean_abs=0.2354423999786377, max_abs=0.875, mean_rel=0.03662949800491333, max_rel=5.367275714874268, norm_rel=0.008059906773269176, ref_abs_avg=29.890243530273438, test_abs_avg=29.89674186706543
liger_forward grad[12] vs paper_forward: mean_abs=0.27531373500823975, max_abs=2.0, mean_rel=0.055082716047763824, max_rel=377.24517822265625, norm_rel=0.008212856948375702, ref_abs_avg=34.97532653808594, test_abs_avg=34.975894927978516
liger_forward grad[13] vs paper_forward: mean_abs=0.26533591747283936, max_abs=1.5, mean_rel=0.05550549924373627, max_rel=786.2046508789062, norm_rel=0.008023788221180439, ref_abs_avg=34.594703674316406, test_abs_avg=34.59341812133789
liger_forward grad[14] vs paper_forward: mean_abs=0.22702789306640625, max_abs=0.875, mean_rel=0.03807083144783974, max_rel=4.454415321350098, norm_rel=0.008435571566224098, ref_abs_avg=27.788984298706055, test_abs_avg=27.78614616394043
liger_forward grad[15] vs paper_forward: mean_abs=0.2543281316757202, max_abs=1.5, mean_rel=0.05196937546133995, max_rel=551.337890625, norm_rel=0.00805206410586834, ref_abs_avg=32.99018859863281, test_abs_avg=32.99004364013672
liger_forward grad[16] vs paper_forward: mean_abs=0.24571320414543152, max_abs=1.5, mean_rel=0.05166144669055939, max_rel=335.4402770996094, norm_rel=0.00790476892143488, ref_abs_avg=32.55807876586914, test_abs_avg=32.557228088378906
liger_forward grad[17] vs paper_forward: mean_abs=0.20674610137939453, max_abs=1.0, mean_rel=0.024867413565516472, max_rel=1.2185050249099731, norm_rel=0.008287983946502209, ref_abs_avg=25.95720672607422, test_abs_avg=25.939916610717773
liger_forward grad[18] vs paper_forward: mean_abs=0.23641805350780487, max_abs=1.5, mean_rel=0.05224187299609184, max_rel=278.0806579589844, norm_rel=0.007923614233732224, ref_abs_avg=31.229530334472656, test_abs_avg=31.230697631835938
liger_forward grad[19] vs paper_forward: mean_abs=0.22793729603290558, max_abs=1.625, mean_rel=0.05001700296998024, max_rel=263.8276062011719, norm_rel=0.007776343263685703, ref_abs_avg=30.747360229492188, test_abs_avg=30.748634338378906
liger_forward grad[20] vs paper_forward: mean_abs=0.1898350715637207, max_abs=0.6875, mean_rel=0.032253045588731766, max_rel=1.7040486335754395, norm_rel=0.007985911332070827, ref_abs_avg=24.46982192993164, test_abs_avg=24.454195022583008
liger_forward grad[21] vs paper_forward: mean_abs=0.22069373726844788, max_abs=1.5, mean_rel=0.05195038020610809, max_rel=379.9342346191406, norm_rel=0.007795434445142746, ref_abs_avg=29.63863754272461, test_abs_avg=29.638648986816406
liger_forward grad[22] vs paper_forward: mean_abs=0.21249154210090637, max_abs=1.25, mean_rel=0.051654115319252014, max_rel=451.1669006347656, norm_rel=0.007676979061216116, ref_abs_avg=29.050079345703125, test_abs_avg=29.049097061157227
liger_forward grad[23] vs paper_forward: mean_abs=0.1757211685180664, max_abs=0.75, mean_rel=0.04005594551563263, max_rel=4.19526481628418, norm_rel=0.007859153673052788, ref_abs_avg=23.581113815307617, test_abs_avg=23.582855224609375
liger_forward grad[24] vs paper_forward: mean_abs=0.20707835257053375, max_abs=1.25, mean_rel=0.04565852880477905, max_rel=218.98658752441406, norm_rel=0.007644762750715017, ref_abs_avg=28.44811248779297, test_abs_avg=28.44776153564453
liger_forward grad[25] vs paper_forward: mean_abs=0.19978533685207367, max_abs=1.5, mean_rel=0.04586571082472801, max_rel=354.77130126953125, norm_rel=0.007464321795850992, ref_abs_avg=28.21490478515625, test_abs_avg=28.213973999023438
liger_forward grad[26] vs paper_forward: mean_abs=0.1894702911376953, max_abs=0.75, mean_rel=0.03188049793243408, max_rel=1.8896907567977905, norm_rel=0.007851164788007736, ref_abs_avg=24.908193588256836, test_abs_avg=24.903610229492188
liger_forward grad[27] vs paper_forward: mean_abs=0.22923590242862701, max_abs=1.5, mean_rel=0.05193454027175903, max_rel=413.493896484375, norm_rel=0.007974772714078426, ref_abs_avg=30.046279907226562, test_abs_avg=30.04599952697754
liger_forward grad[28] vs paper_forward: mean_abs=0.22012537717819214, max_abs=1.5, mean_rel=0.04639056324958801, max_rel=425.3360595703125, norm_rel=0.007781476713716984, ref_abs_avg=29.70751953125, test_abs_avg=29.70941162109375
liger_forward grad[29] vs paper_forward: mean_abs=0.17568111419677734, max_abs=0.5625, mean_rel=0.035864703357219696, max_rel=5.885093688964844, norm_rel=0.007651507388800383, ref_abs_avg=23.704418182373047, test_abs_avg=23.703693389892578
liger_forward grad[30] vs paper_forward: mean_abs=0.20699508488178253, max_abs=1.5, mean_rel=0.04968643933534622, max_rel=326.66802978515625, norm_rel=0.007765791844576597, ref_abs_avg=27.929651260375977, test_abs_avg=27.928905487060547
liger_forward grad[31] vs paper_forward: mean_abs=0.197170227766037, max_abs=1.25, mean_rel=0.04616658762097359, max_rel=258.49359130859375, norm_rel=0.007505375891923904, ref_abs_avg=27.645709991455078, test_abs_avg=27.645580291748047
liger_forward grad[32] vs paper_forward: mean_abs=0.1639575958251953, max_abs=0.6875, mean_rel=0.025187550112605095, max_rel=1.226660132408142, norm_rel=0.00825948640704155, ref_abs_avg=20.771821975708008, test_abs_avg=20.76016616821289
liger_forward grad[33] vs paper_forward: mean_abs=0.18777556717395782, max_abs=1.5, mean_rel=0.04695603623986244, max_rel=437.1702880859375, norm_rel=0.007609130814671516, ref_abs_avg=25.878665924072266, test_abs_avg=25.878427505493164
liger_forward grad[34] vs paper_forward: mean_abs=0.18025867640972137, max_abs=1.3125, mean_rel=0.04545488953590393, max_rel=246.85813903808594, norm_rel=0.007472873665392399, ref_abs_avg=25.426624298095703, test_abs_avg=25.42650032043457
liger_forward grad[35] vs paper_forward: mean_abs=0.14269304275512695, max_abs=0.5625, mean_rel=0.0670228973031044, max_rel=15.022050857543945, norm_rel=0.007672206033021212, ref_abs_avg=19.555301666259766, test_abs_avg=19.560020446777344
liger_forward grad[36] vs paper_forward: mean_abs=0.17430587112903595, max_abs=1.25, mean_rel=0.045060545206069946, max_rel=211.47467041015625, norm_rel=0.007411935832351446, ref_abs_avg=24.695737838745117, test_abs_avg=24.69510269165039
liger_forward grad[37] vs paper_forward: mean_abs=0.16745181381702423, max_abs=1.25, mean_rel=0.040857501327991486, max_rel=119.06818389892578, norm_rel=0.007238818798214197, ref_abs_avg=24.410198211669922, test_abs_avg=24.410083770751953
liger_forward grad[38] vs paper_forward: mean_abs=0.13639259338378906, max_abs=0.5, mean_rel=0.02559605985879898, max_rel=0.8286788463592529, norm_rel=0.007524217013269663, ref_abs_avg=18.968460083007812, test_abs_avg=18.965078353881836
liger_forward grad[39] vs paper_forward: mean_abs=0.1629263460636139, max_abs=1.0625, mean_rel=0.04800339415669441, max_rel=259.2184753417969, norm_rel=0.007354690693318844, ref_abs_avg=23.310592651367188, test_abs_avg=23.310556411743164
liger_forward grad[40] vs paper_forward: mean_abs=0.1576453149318695, max_abs=1.015625, mean_rel=0.040841586887836456, max_rel=116.39879608154297, norm_rel=0.007135935593396425, ref_abs_avg=23.32522964477539, test_abs_avg=23.325008392333984
liger_forward grad[41] vs paper_forward: mean_abs=0.1329803466796875, max_abs=0.5, mean_rel=0.019185803830623627, max_rel=1.0575100183486938, norm_rel=0.007511119823902845, ref_abs_avg=18.79631996154785, test_abs_avg=18.79595947265625
liger_forward grad[42] vs paper_forward: mean_abs=0.15317797660827637, max_abs=1.0, mean_rel=0.043891169130802155, max_rel=240.3684539794922, norm_rel=0.007216543424874544, ref_abs_avg=22.35704803466797, test_abs_avg=22.357440948486328
liger_forward grad[43] vs paper_forward: mean_abs=0.14735423028469086, max_abs=1.0, mean_rel=0.042712144553661346, max_rel=210.19406127929688, norm_rel=0.00699878903105855, ref_abs_avg=22.242189407348633, test_abs_avg=22.242399215698242
liger_forward grad[44] vs paper_forward: mean_abs=0.12296295166015625, max_abs=0.5, mean_rel=0.02836778573691845, max_rel=2.9483907222747803, norm_rel=0.007322561927139759, ref_abs_avg=17.98442268371582, test_abs_avg=17.986927032470703
liger_forward grad[45] vs paper_forward: mean_abs=0.14529719948768616, max_abs=1.0, mean_rel=0.04374434053897858, max_rel=282.4351501464844, norm_rel=0.007096866145730019, ref_abs_avg=21.57836151123047, test_abs_avg=21.578384399414062
liger_forward grad[46] vs paper_forward: mean_abs=0.13998353481292725, max_abs=1.0, mean_rel=0.043124474585056305, max_rel=192.75999450683594, norm_rel=0.0069037978537380695, ref_abs_avg=21.45242691040039, test_abs_avg=21.452945709228516
liger_forward grad[47] vs paper_forward: mean_abs=0.11350178718566895, max_abs=0.5, mean_rel=0.03422927111387253, max_rel=4.940322399139404, norm_rel=0.006860768888145685, ref_abs_avg=17.740386962890625, test_abs_avg=17.74198341369629
liger_forward grad[48] vs paper_forward: mean_abs=0.13784539699554443, max_abs=1.0, mean_rel=0.044440340250730515, max_rel=312.7087707519531, norm_rel=0.0069846478290855885, ref_abs_avg=20.8786563873291, test_abs_avg=20.878253936767578
liger_forward grad[49] vs paper_forward: mean_abs=0.13271783292293549, max_abs=1.0, mean_rel=0.04231514036655426, max_rel=265.7769775390625, norm_rel=0.006770820822566748, ref_abs_avg=20.750621795654297, test_abs_avg=20.749879837036133
liger_forward grad[50] vs paper_forward: mean_abs=0.12564504146575928, max_abs=0.5, mean_rel=0.045748449862003326, max_rel=3.9696335792541504, norm_rel=0.007230504881590605, ref_abs_avg=18.61873435974121, test_abs_avg=18.62066078186035
liger_forward grad[51] vs paper_forward: mean_abs=0.15503951907157898, max_abs=1.125, mean_rel=0.04636775702238083, max_rel=293.7162780761719, norm_rel=0.007445690222084522, ref_abs_avg=21.8885440826416, test_abs_avg=21.88854217529297
liger_forward grad[52] vs paper_forward: mean_abs=0.1483812928199768, max_abs=1.0, mean_rel=0.04731496050953865, max_rel=267.4613342285156, norm_rel=0.0072095817886292934, ref_abs_avg=21.71375274658203, test_abs_avg=21.714229583740234
liger_forward grad[53] vs paper_forward: mean_abs=0.11006736755371094, max_abs=0.5, mean_rel=0.022497285157442093, max_rel=0.6724235415458679, norm_rel=0.0067473822273314, ref_abs_avg=17.351213455200195, test_abs_avg=17.353412628173828
liger_forward grad[54] vs paper_forward: mean_abs=0.13908109068870544, max_abs=1.0, mean_rel=0.04546668380498886, max_rel=214.05442810058594, norm_rel=0.007136555854231119, ref_abs_avg=20.532020568847656, test_abs_avg=20.531871795654297
liger_forward grad[55] vs paper_forward: mean_abs=0.13375785946846008, max_abs=1.0, mean_rel=0.04451492428779602, max_rel=196.88035583496094, norm_rel=0.007041263394057751, ref_abs_avg=20.06279945373535, test_abs_avg=20.06359100341797
liger_forward grad[56] vs paper_forward: mean_abs=0.11344528198242188, max_abs=0.5, mean_rel=0.037946827709674835, max_rel=6.196162700653076, norm_rel=0.006842652335762978, ref_abs_avg=17.514633178710938, test_abs_avg=17.514707565307617
liger_forward grad[57] vs paper_forward: mean_abs=0.12842845916748047, max_abs=1.0, mean_rel=0.043731383979320526, max_rel=189.21693420410156, norm_rel=0.007006743457168341, ref_abs_avg=19.349082946777344, test_abs_avg=19.34912109375
liger_forward grad[58] vs paper_forward: mean_abs=0.12379743903875351, max_abs=1.0, mean_rel=0.045547716319561005, max_rel=263.63641357421875, norm_rel=0.006963212974369526, ref_abs_avg=18.8068790435791, test_abs_avg=18.806495666503906
liger_forward grad[59] vs paper_forward: mean_abs=0.09840422868728638, max_abs=0.4375, mean_rel=0.0299447663128376, max_rel=3.968156576156616, norm_rel=0.007110217586159706, ref_abs_avg=14.927916526794434, test_abs_avg=14.932795524597168
liger_forward grad[60] vs paper_forward: mean_abs=0.11956708133220673, max_abs=1.0, mean_rel=0.0428757518529892, max_rel=292.87969970703125, norm_rel=0.006807577330619097, ref_abs_avg=18.545475006103516, test_abs_avg=18.54570770263672
liger_forward grad[61] vs paper_forward: mean_abs=0.11526350677013397, max_abs=1.0, mean_rel=0.042724668979644775, max_rel=234.89202880859375, norm_rel=0.006677411030977964, ref_abs_avg=18.251110076904297, test_abs_avg=18.250682830810547
liger_forward grad[62] vs paper_forward: mean_abs=0.08794450759887695, max_abs=0.375, mean_rel=0.027717242017388344, max_rel=5.49966287612915, norm_rel=0.006499346345663071, ref_abs_avg=14.672613143920898, test_abs_avg=14.665749549865723
liger_forward grad[63] vs paper_forward: mean_abs=0.11170967668294907, max_abs=1.0, mean_rel=0.04143141955137253, max_rel=194.74874877929688, norm_rel=0.006775846239179373, ref_abs_avg=17.46147346496582, test_abs_avg=17.46164894104004
liger_forward grad[64] vs paper_forward: mean_abs=0.10786876082420349, max_abs=1.0, mean_rel=0.043329525738954544, max_rel=167.93634033203125, norm_rel=0.006672712042927742, ref_abs_avg=17.200626373291016, test_abs_avg=17.200241088867188
liger_forward grad[65] vs paper_forward: mean_abs=0.08949732780456543, max_abs=0.375, mean_rel=0.04082559421658516, max_rel=10.047929763793945, norm_rel=0.006699560210108757, ref_abs_avg=14.308759689331055, test_abs_avg=14.307394027709961
liger_forward grad[66] vs paper_forward: mean_abs=0.10620453208684921, max_abs=1.0, mean_rel=0.04196029156446457, max_rel=174.61070251464844, norm_rel=0.006631034426391125, ref_abs_avg=16.940820693969727, test_abs_avg=16.940937042236328
liger_forward grad[67] vs paper_forward: mean_abs=0.10153575241565704, max_abs=1.0, mean_rel=0.038803309202194214, max_rel=115.76407623291016, norm_rel=0.006478830240666866, ref_abs_avg=16.694252014160156, test_abs_avg=16.694414138793945
liger_forward grad[68] vs paper_forward: mean_abs=0.08643627166748047, max_abs=0.3125, mean_rel=0.03588121384382248, max_rel=6.931487560272217, norm_rel=0.007007440086454153, ref_abs_avg=12.962664604187012, test_abs_avg=12.969388961791992
liger_forward grad[69] vs paper_forward: mean_abs=0.0997702032327652, max_abs=1.0, mean_rel=0.03970639407634735, max_rel=160.36061096191406, norm_rel=0.0065034800209105015, ref_abs_avg=16.28363800048828, test_abs_avg=16.28376007080078
liger_forward grad[70] vs paper_forward: mean_abs=0.09657295048236847, max_abs=0.75, mean_rel=0.03762942552566528, max_rel=263.1852111816406, norm_rel=0.006341618485748768, ref_abs_avg=16.222196578979492, test_abs_avg=16.221668243408203
liger_forward grad[71] vs paper_forward: mean_abs=0.07431262731552124, max_abs=0.330078125, mean_rel=0.053851913660764694, max_rel=14.67959976196289, norm_rel=0.006163106299936771, ref_abs_avg=13.232671737670898, test_abs_avg=13.224231719970703
liger_forward grad[72] vs paper_forward: mean_abs=0.09417283535003662, max_abs=0.75, mean_rel=0.038492389023303986, max_rel=160.75868225097656, norm_rel=0.006360228173434734, ref_abs_avg=15.785080909729004, test_abs_avg=15.785139083862305
liger_forward grad[73] vs paper_forward: mean_abs=0.09072627127170563, max_abs=0.75, mean_rel=0.03735826909542084, max_rel=138.3997344970703, norm_rel=0.006295158062130213, ref_abs_avg=15.407249450683594, test_abs_avg=15.407964706420898
liger_forward grad[74] vs paper_forward: mean_abs=0.09699690341949463, max_abs=0.375, mean_rel=0.04481044411659241, max_rel=4.003353118896484, norm_rel=0.007710330653935671, ref_abs_avg=13.125419616699219, test_abs_avg=13.124338150024414
liger_forward grad[75] vs paper_forward: mean_abs=0.10743749886751175, max_abs=1.0, mean_rel=0.04421346262097359, max_rel=192.333251953125, norm_rel=0.007002940867096186, ref_abs_avg=16.16042709350586, test_abs_avg=16.160400390625
liger_forward grad[76] vs paper_forward: mean_abs=0.10405668616294861, max_abs=0.75, mean_rel=0.03897596895694733, max_rel=134.32395935058594, norm_rel=0.006781245581805706, ref_abs_avg=16.284515380859375, test_abs_avg=16.284679412841797
liger_forward grad[77] vs paper_forward: mean_abs=0.07394289970397949, max_abs=0.43310546875, mean_rel=0.04434487968683243, max_rel=6.1152167320251465, norm_rel=0.006288429256528616, ref_abs_avg=13.126516342163086, test_abs_avg=13.12291145324707
liger_forward grad[78] vs paper_forward: mean_abs=0.09790507704019547, max_abs=0.75, mean_rel=0.04158702492713928, max_rel=180.57656860351562, norm_rel=0.006781393196433783, ref_abs_avg=15.263236999511719, test_abs_avg=15.263075828552246
liger_forward grad[79] vs paper_forward: mean_abs=0.09517621994018555, max_abs=0.8125, mean_rel=0.04118053615093231, max_rel=185.6078338623047, norm_rel=0.006755551788955927, ref_abs_avg=14.986063003540039, test_abs_avg=14.985308647155762
liger_forward grad[80] vs paper_forward: mean_abs=0.07323431968688965, max_abs=0.25, mean_rel=0.04134707897901535, max_rel=3.6504764556884766, norm_rel=0.006550434045493603, ref_abs_avg=11.962303161621094, test_abs_avg=11.967037200927734
liger_forward grad[81] vs paper_forward: mean_abs=0.0910838395357132, max_abs=1.0, mean_rel=0.039680466055870056, max_rel=249.07666015625, norm_rel=0.006625093054026365, ref_abs_avg=14.592573165893555, test_abs_avg=14.5927095413208
liger_forward grad[82] vs paper_forward: mean_abs=0.0884343832731247, max_abs=1.0, mean_rel=0.03904540091753006, max_rel=122.00077056884766, norm_rel=0.006520215421915054, ref_abs_avg=14.463117599487305, test_abs_avg=14.461393356323242
liger_forward grad[83] vs paper_forward: mean_abs=0.07041791081428528, max_abs=0.25, mean_rel=0.05026840418577194, max_rel=4.120148658752441, norm_rel=0.00645234202966094, ref_abs_avg=11.614078521728516, test_abs_avg=11.613133430480957
liger_forward grad[84] vs paper_forward: mean_abs=0.0854048952460289, max_abs=0.75, mean_rel=0.03913106769323349, max_rel=355.4344482421875, norm_rel=0.0064543792977929115, ref_abs_avg=14.089521408081055, test_abs_avg=14.089558601379395
liger_forward grad[85] vs paper_forward: mean_abs=0.08310790359973907, max_abs=0.75, mean_rel=0.03743959963321686, max_rel=148.68580627441406, norm_rel=0.006355800200253725, ref_abs_avg=13.928613662719727, test_abs_avg=13.929402351379395
liger_forward grad[86] vs paper_forward: mean_abs=0.06633710861206055, max_abs=0.265625, mean_rel=0.03839518502354622, max_rel=3.792027235031128, norm_rel=0.006731513421982527, ref_abs_avg=10.566583633422852, test_abs_avg=10.567148208618164
liger_forward grad[87] vs paper_forward: mean_abs=0.07950817048549652, max_abs=0.75, mean_rel=0.036601461470127106, max_rel=127.06278991699219, norm_rel=0.006238972768187523, ref_abs_avg=13.677539825439453, test_abs_avg=13.677746772766113
liger_forward grad[88] vs paper_forward: mean_abs=0.07779662311077118, max_abs=0.75, mean_rel=0.034536898136138916, max_rel=137.26116943359375, norm_rel=0.006124542560428381, ref_abs_avg=13.665019989013672, test_abs_avg=13.66586685180664
liger_forward grad[89] vs paper_forward: mean_abs=0.0602993369102478, max_abs=0.28125, mean_rel=0.03089076280593872, max_rel=4.607662677764893, norm_rel=0.006009049713611603, ref_abs_avg=11.024014472961426, test_abs_avg=11.022953033447266
liger_forward grad[90] vs paper_forward: mean_abs=0.07483848184347153, max_abs=1.0, mean_rel=0.036150574684143066, max_rel=165.43820190429688, norm_rel=0.006140375044196844, ref_abs_avg=13.131539344787598, test_abs_avg=13.131556510925293
liger_forward grad[91] vs paper_forward: mean_abs=0.07319211959838867, max_abs=0.75, mean_rel=0.03518340736627579, max_rel=85.27041625976562, norm_rel=0.006106868386268616, ref_abs_avg=12.9771146774292, test_abs_avg=12.976934432983398
liger_forward grad[92] vs paper_forward: mean_abs=0.06273792684078217, max_abs=0.25, mean_rel=0.06332384794950485, max_rel=10.149768829345703, norm_rel=0.006101895123720169, ref_abs_avg=11.136255264282227, test_abs_avg=11.135612487792969
liger_forward grad[93] vs paper_forward: mean_abs=0.0713956356048584, max_abs=1.0, mean_rel=0.034682370722293854, max_rel=214.66351318359375, norm_rel=0.0059940749779343605, ref_abs_avg=12.935240745544434, test_abs_avg=12.93506145477295
liger_forward grad[94] vs paper_forward: mean_abs=0.06978064775466919, max_abs=0.75, mean_rel=0.03491027653217316, max_rel=118.00926208496094, norm_rel=0.006024105940014124, ref_abs_avg=12.665876388549805, test_abs_avg=12.664031982421875
liger_forward grad[95] vs paper_forward: mean_abs=0.05706024169921875, max_abs=0.25, mean_rel=0.01683712750673294, max_rel=0.8862230777740479, norm_rel=0.00578599376603961, ref_abs_avg=10.804096221923828, test_abs_avg=10.807247161865234
liger_forward grad[96] vs paper_forward: mean_abs=0.06851357221603394, max_abs=1.0, mean_rel=0.033238258212804794, max_rel=231.99093627929688, norm_rel=0.0058666723780334, ref_abs_avg=12.79182243347168, test_abs_avg=12.791703224182129
liger_forward grad[97] vs paper_forward: mean_abs=0.06445834040641785, max_abs=0.75, mean_rel=0.034495942294597626, max_rel=244.03321838378906, norm_rel=0.005840370897203684, ref_abs_avg=12.134767532348633, test_abs_avg=12.13507080078125
identity layers + randn queries
paper_forward fwd+bwd:  112.807 ms
paper_forward bwd-only: 88.913 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
torch_compile_phases_forward fwd+bwd:  48.531 ms
torch_compile_phases_forward bwd-only: 39.459 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.754 GiB
liger_forward fwd+bwd:  65.479 ms
liger_forward bwd-only: 53.088 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
production_forward fwd+bwd:  33.805 ms
production_forward bwd-only: 28.841 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.238 GiB, fwd+bwd=5.238 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016267582541331649, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.00814579613506794, max_abs=0.4375, mean_rel=0.07131712138652802, max_rel=85.27703094482422, norm_rel=0.01937408186495304, ref_abs_avg=0.4537889361381531, test_abs_avg=0.4538002014160156
production_forward grad[1] vs paper_forward: mean_abs=4.998895645141602, max_abs=40.0, mean_rel=0.16539645195007324, max_rel=512.3779296875, norm_rel=0.019777290523052216, ref_abs_avg=222.54330444335938, test_abs_avg=222.5536346435547
production_forward grad[2] vs paper_forward: mean_abs=0.8767464756965637, max_abs=4.0, mean_rel=0.3060038685798645, max_rel=102.19412994384766, norm_rel=0.023857248947024345, ref_abs_avg=37.14860534667969, test_abs_avg=37.16288757324219
production_forward grad[3] vs paper_forward: mean_abs=1.0341315269470215, max_abs=9.0, mean_rel=0.1697678118944168, max_rel=2400.509765625, norm_rel=0.022577757015824318, ref_abs_avg=46.08859634399414, test_abs_avg=46.09063720703125
production_forward grad[4] vs paper_forward: mean_abs=1.0003628730773926, max_abs=6.25, mean_rel=0.16736936569213867, max_rel=1775.1602783203125, norm_rel=0.022168872877955437, ref_abs_avg=45.374664306640625, test_abs_avg=45.3727912902832
production_forward grad[5] vs paper_forward: mean_abs=0.7143192291259766, max_abs=3.25, mean_rel=0.07326281070709229, max_rel=10.350422859191895, norm_rel=0.02257540076971054, ref_abs_avg=33.188201904296875, test_abs_avg=33.229583740234375
production_forward grad[6] vs paper_forward: mean_abs=0.8937381505966187, max_abs=5.375, mean_rel=0.16059166193008423, max_rel=1668.123291015625, norm_rel=0.022227132692933083, ref_abs_avg=40.45201110839844, test_abs_avg=40.45249938964844
production_forward grad[7] vs paper_forward: mean_abs=0.8701173663139343, max_abs=5.1875, mean_rel=0.15476374328136444, max_rel=1413.0948486328125, norm_rel=0.021811222657561302, ref_abs_avg=40.12800598144531, test_abs_avg=40.12761688232422
production_forward grad[8] vs paper_forward: mean_abs=0.7151308059692383, max_abs=2.5625, mean_rel=0.11518420279026031, max_rel=13.901463508605957, norm_rel=0.022431982681155205, ref_abs_avg=32.132957458496094, test_abs_avg=32.139137268066406
production_forward grad[9] vs paper_forward: mean_abs=0.8189778327941895, max_abs=5.5, mean_rel=0.15459050238132477, max_rel=1914.88037109375, norm_rel=0.022173404693603516, ref_abs_avg=37.146080017089844, test_abs_avg=37.146583557128906
production_forward grad[10] vs paper_forward: mean_abs=0.7970632314682007, max_abs=5.0, mean_rel=0.1426427960395813, max_rel=788.7308959960938, norm_rel=0.02183659002184868, ref_abs_avg=36.69757843017578, test_abs_avg=36.69697570800781
production_forward grad[11] vs paper_forward: mean_abs=0.626105785369873, max_abs=2.5, mean_rel=0.10813405364751816, max_rel=20.908191680908203, norm_rel=0.022892845794558525, ref_abs_avg=27.578472137451172, test_abs_avg=27.589622497558594
production_forward grad[12] vs paper_forward: mean_abs=0.7595095038414001, max_abs=5.0, mean_rel=0.15283536911010742, max_rel=1279.237060546875, norm_rel=0.02200908586382866, ref_abs_avg=34.70658874511719, test_abs_avg=34.70817565917969
production_forward grad[13] vs paper_forward: mean_abs=0.7413275241851807, max_abs=4.6875, mean_rel=0.14486446976661682, max_rel=1372.2490234375, norm_rel=0.02163294143974781, ref_abs_avg=34.41774368286133, test_abs_avg=34.41996765136719
production_forward grad[14] vs paper_forward: mean_abs=0.5814104080200195, max_abs=2.75, mean_rel=1.4640138149261475, max_rel=709.3898315429688, norm_rel=0.0228364709764719, ref_abs_avg=26.16452407836914, test_abs_avg=26.168392181396484
production_forward grad[15] vs paper_forward: mean_abs=0.7063862681388855, max_abs=4.5, mean_rel=0.15222062170505524, max_rel=1247.656982421875, norm_rel=0.021728938445448875, ref_abs_avg=32.700439453125, test_abs_avg=32.70149230957031
production_forward grad[16] vs paper_forward: mean_abs=0.6935876607894897, max_abs=4.125, mean_rel=0.15313103795051575, max_rel=783.5670776367188, norm_rel=0.021612059324979782, ref_abs_avg=32.27992248535156, test_abs_avg=32.28094482421875
production_forward grad[17] vs paper_forward: mean_abs=0.5922229886054993, max_abs=2.40625, mean_rel=0.08048994839191437, max_rel=4.888848304748535, norm_rel=0.024568233639001846, ref_abs_avg=24.31856346130371, test_abs_avg=24.304367065429688
production_forward grad[18] vs paper_forward: mean_abs=0.6669813990592957, max_abs=4.5, mean_rel=0.14901506900787354, max_rel=1033.961181640625, norm_rel=0.02163538709282875, ref_abs_avg=30.981300354003906, test_abs_avg=30.982513427734375
production_forward grad[19] vs paper_forward: mean_abs=0.6478801369667053, max_abs=4.04296875, mean_rel=0.13760673999786377, max_rel=666.7941284179688, norm_rel=0.021412277594208717, ref_abs_avg=30.419719696044922, test_abs_avg=30.417396545410156
production_forward grad[20] vs paper_forward: mean_abs=0.507897138595581, max_abs=1.96875, mean_rel=0.10524603724479675, max_rel=4.454342842102051, norm_rel=0.02104514092206955, ref_abs_avg=23.747114181518555, test_abs_avg=23.729209899902344
production_forward grad[21] vs paper_forward: mean_abs=0.6293540000915527, max_abs=3.9375, mean_rel=0.14315924048423767, max_rel=1609.5950927734375, norm_rel=0.0215655118227005, ref_abs_avg=29.34557342529297, test_abs_avg=29.348770141601562
production_forward grad[22] vs paper_forward: mean_abs=0.6152139902114868, max_abs=4.5, mean_rel=0.15573209524154663, max_rel=1251.78759765625, norm_rel=0.021330464631319046, ref_abs_avg=29.026811599731445, test_abs_avg=29.029922485351562
production_forward grad[23] vs paper_forward: mean_abs=0.47484350204467773, max_abs=2.0, mean_rel=0.06646359711885452, max_rel=1.875199556350708, norm_rel=0.020302480086684227, ref_abs_avg=23.125001907348633, test_abs_avg=23.089641571044922
production_forward grad[24] vs paper_forward: mean_abs=0.5989645719528198, max_abs=3.84375, mean_rel=0.14878442883491516, max_rel=1048.29052734375, norm_rel=0.021480705589056015, ref_abs_avg=28.016029357910156, test_abs_avg=28.01534652709961
production_forward grad[25] vs paper_forward: mean_abs=0.582957923412323, max_abs=3.75, mean_rel=0.13721910119056702, max_rel=757.7266845703125, norm_rel=0.02103566937148571, ref_abs_avg=27.88187599182129, test_abs_avg=27.878814697265625
production_forward grad[26] vs paper_forward: mean_abs=0.5455026626586914, max_abs=2.3359375, mean_rel=0.13670814037322998, max_rel=21.332862854003906, norm_rel=0.022364560514688492, ref_abs_avg=24.790836334228516, test_abs_avg=24.784412384033203
production_forward grad[27] vs paper_forward: mean_abs=0.6942466497421265, max_abs=4.75, mean_rel=0.14633183181285858, max_rel=883.1027221679688, norm_rel=0.023100370541214943, ref_abs_avg=30.147903442382812, test_abs_avg=30.14756965637207
production_forward grad[28] vs paper_forward: mean_abs=0.6794672012329102, max_abs=4.75, mean_rel=0.15659089386463165, max_rel=888.8652954101562, norm_rel=0.022949861362576485, ref_abs_avg=29.765010833740234, test_abs_avg=29.762981414794922
production_forward grad[29] vs paper_forward: mean_abs=0.5231418609619141, max_abs=2.375, mean_rel=0.1160375326871872, max_rel=9.504457473754883, norm_rel=0.024071892723441124, ref_abs_avg=22.038625717163086, test_abs_avg=22.030517578125
production_forward grad[30] vs paper_forward: mean_abs=0.648338794708252, max_abs=4.0, mean_rel=0.16239415109157562, max_rel=1393.3104248046875, norm_rel=0.023627398535609245, ref_abs_avg=27.573139190673828, test_abs_avg=27.571922302246094
production_forward grad[31] vs paper_forward: mean_abs=0.6366310119628906, max_abs=4.0625, mean_rel=0.16280561685562134, max_rel=973.3333129882812, norm_rel=0.02361084707081318, ref_abs_avg=27.074493408203125, test_abs_avg=27.07638168334961
production_forward grad[32] vs paper_forward: mean_abs=0.4978792369365692, max_abs=2.06640625, mean_rel=0.10750217735767365, max_rel=5.73328161239624, norm_rel=0.025115206837654114, ref_abs_avg=20.309398651123047, test_abs_avg=20.305316925048828
production_forward grad[33] vs paper_forward: mean_abs=0.6042579412460327, max_abs=4.0, mean_rel=0.1556982845067978, max_rel=1485.9505615234375, norm_rel=0.023403184488415718, ref_abs_avg=25.884750366210938, test_abs_avg=25.882518768310547
production_forward grad[34] vs paper_forward: mean_abs=0.5915201902389526, max_abs=4.0, mean_rel=0.14819404482841492, max_rel=888.414306640625, norm_rel=0.02325832098722458, ref_abs_avg=25.563133239746094, test_abs_avg=25.558151245117188
production_forward grad[35] vs paper_forward: mean_abs=0.47292983531951904, max_abs=1.875, mean_rel=0.1316293478012085, max_rel=14.686712265014648, norm_rel=0.023706482723355293, ref_abs_avg=20.06636619567871, test_abs_avg=20.07862091064453
production_forward grad[36] vs paper_forward: mean_abs=0.5640178918838501, max_abs=3.796875, mean_rel=0.14590460062026978, max_rel=1459.2257080078125, norm_rel=0.023151511326432228, ref_abs_avg=24.437381744384766, test_abs_avg=24.436853408813477
production_forward grad[37] vs paper_forward: mean_abs=0.5561177730560303, max_abs=3.75, mean_rel=0.148883655667305, max_rel=973.2484741210938, norm_rel=0.023056618869304657, ref_abs_avg=24.215129852294922, test_abs_avg=24.21524429321289
production_forward grad[38] vs paper_forward: mean_abs=0.43282461166381836, max_abs=1.75, mean_rel=0.06995931267738342, max_rel=2.3140759468078613, norm_rel=0.02368774451315403, ref_abs_avg=18.66110610961914, test_abs_avg=18.66158676147461
production_forward grad[39] vs paper_forward: mean_abs=0.5310391783714294, max_abs=3.640625, mean_rel=0.15877050161361694, max_rel=888.2855834960938, norm_rel=0.023057563230395317, ref_abs_avg=23.1020565032959, test_abs_avg=23.100603103637695
production_forward grad[40] vs paper_forward: mean_abs=0.5195163488388062, max_abs=3.125, mean_rel=0.1425379514694214, max_rel=732.1806640625, norm_rel=0.022691326215863228, ref_abs_avg=23.006858825683594, test_abs_avg=23.0080623626709
production_forward grad[41] vs paper_forward: mean_abs=0.42472410202026367, max_abs=1.625, mean_rel=0.10402721166610718, max_rel=7.221560478210449, norm_rel=0.02359822206199169, ref_abs_avg=17.901836395263672, test_abs_avg=17.924694061279297
production_forward grad[42] vs paper_forward: mean_abs=0.5071492791175842, max_abs=3.234375, mean_rel=0.13987913727760315, max_rel=854.05419921875, norm_rel=0.02260330691933632, ref_abs_avg=22.475770950317383, test_abs_avg=22.475313186645508
production_forward grad[43] vs paper_forward: mean_abs=0.494581401348114, max_abs=3.25, mean_rel=0.14211155474185944, max_rel=797.9248046875, norm_rel=0.02235996723175049, ref_abs_avg=22.21294403076172, test_abs_avg=22.214405059814453
production_forward grad[44] vs paper_forward: mean_abs=0.3895912170410156, max_abs=1.5, mean_rel=0.20388519763946533, max_rel=39.60944366455078, norm_rel=0.02108575962483883, ref_abs_avg=18.05931854248047, test_abs_avg=18.092445373535156
production_forward grad[45] vs paper_forward: mean_abs=0.4792264699935913, max_abs=3.1875, mean_rel=0.15534678101539612, max_rel=1104.995849609375, norm_rel=0.022506091743707657, ref_abs_avg=21.324180603027344, test_abs_avg=21.324268341064453
production_forward grad[46] vs paper_forward: mean_abs=0.47037357091903687, max_abs=3.0, mean_rel=0.1465771496295929, max_rel=942.0276489257812, norm_rel=0.022463077679276466, ref_abs_avg=21.010133743286133, test_abs_avg=21.009429931640625
production_forward grad[47] vs paper_forward: mean_abs=0.3720102310180664, max_abs=1.5, mean_rel=0.07191340625286102, max_rel=3.8965859413146973, norm_rel=0.02175239846110344, ref_abs_avg=17.265851974487305, test_abs_avg=17.278175354003906
production_forward grad[48] vs paper_forward: mean_abs=0.4595889151096344, max_abs=2.9150390625, mean_rel=0.15023908019065857, max_rel=1019.98095703125, norm_rel=0.022219303995370865, ref_abs_avg=20.722787857055664, test_abs_avg=20.72168731689453
production_forward grad[49] vs paper_forward: mean_abs=0.45046621561050415, max_abs=2.875, mean_rel=0.14322590827941895, max_rel=468.9449157714844, norm_rel=0.021913567557930946, ref_abs_avg=20.58928871154785, test_abs_avg=20.58568572998047
production_forward grad[50] vs paper_forward: mean_abs=0.43158861994743347, max_abs=2.119873046875, mean_rel=0.20556262135505676, max_rel=56.71604919433594, norm_rel=0.02557108923792839, ref_abs_avg=17.137943267822266, test_abs_avg=17.106830596923828
production_forward grad[51] vs paper_forward: mean_abs=0.5062117576599121, max_abs=4.1484375, mean_rel=0.15750747919082642, max_rel=979.3955688476562, norm_rel=0.023845210671424866, ref_abs_avg=21.321224212646484, test_abs_avg=21.32118797302246
production_forward grad[52] vs paper_forward: mean_abs=0.49827224016189575, max_abs=3.25, mean_rel=0.15607912838459015, max_rel=817.7274169921875, norm_rel=0.023651927709579468, ref_abs_avg=21.113983154296875, test_abs_avg=21.11691665649414
production_forward grad[53] vs paper_forward: mean_abs=0.36947867274284363, max_abs=1.765625, mean_rel=1.0784200429916382, max_rel=496.88665771484375, norm_rel=0.02334301546216011, ref_abs_avg=16.23176383972168, test_abs_avg=16.200532913208008
production_forward grad[54] vs paper_forward: mean_abs=0.4678926169872284, max_abs=3.125, mean_rel=0.14666375517845154, max_rel=870.38720703125, norm_rel=0.023485668003559113, ref_abs_avg=19.953784942626953, test_abs_avg=19.954463958740234
production_forward grad[55] vs paper_forward: mean_abs=0.46358633041381836, max_abs=3.1875, mean_rel=0.16077592968940735, max_rel=1478.0867919921875, norm_rel=0.023619938641786575, ref_abs_avg=19.657779693603516, test_abs_avg=19.653095245361328
production_forward grad[56] vs paper_forward: mean_abs=0.36805248260498047, max_abs=1.25, mean_rel=0.1812121868133545, max_rel=23.45806312561035, norm_rel=0.022379940375685692, ref_abs_avg=16.088924407958984, test_abs_avg=16.08121109008789
production_forward grad[57] vs paper_forward: mean_abs=0.4388439953327179, max_abs=3.25, mean_rel=0.1579168140888214, max_rel=1016.9195556640625, norm_rel=0.023050477728247643, ref_abs_avg=19.05257225036621, test_abs_avg=19.05199432373047
production_forward grad[58] vs paper_forward: mean_abs=0.43179211020469666, max_abs=3.0, mean_rel=0.15563109517097473, max_rel=1240.6336669921875, norm_rel=0.022997494786977768, ref_abs_avg=18.80270767211914, test_abs_avg=18.80129623413086
production_forward grad[59] vs paper_forward: mean_abs=0.33393409848213196, max_abs=1.3125, mean_rel=0.2999182939529419, max_rel=99.85391235351562, norm_rel=0.022648148238658905, ref_abs_avg=14.636616706848145, test_abs_avg=14.642719268798828
production_forward grad[60] vs paper_forward: mean_abs=0.40916764736175537, max_abs=2.75, mean_rel=0.14763373136520386, max_rel=639.10546875, norm_rel=0.022777331992983818, ref_abs_avg=17.970178604125977, test_abs_avg=17.967430114746094
production_forward grad[61] vs paper_forward: mean_abs=0.40215814113616943, max_abs=2.75, mean_rel=0.1349988430738449, max_rel=566.4854125976562, norm_rel=0.022056685760617256, ref_abs_avg=18.24830436706543, test_abs_avg=18.248641967773438
production_forward grad[62] vs paper_forward: mean_abs=0.3168025016784668, max_abs=1.26171875, mean_rel=0.16256439685821533, max_rel=26.228853225708008, norm_rel=0.0216497965157032, ref_abs_avg=14.903053283691406, test_abs_avg=14.917532920837402
production_forward grad[63] vs paper_forward: mean_abs=0.39278465509414673, max_abs=2.75, mean_rel=0.14327336847782135, max_rel=689.682373046875, norm_rel=0.022157201543450356, ref_abs_avg=17.71542739868164, test_abs_avg=17.71523666381836
production_forward grad[64] vs paper_forward: mean_abs=0.37953418493270874, max_abs=2.625, mean_rel=0.14121341705322266, max_rel=781.5250244140625, norm_rel=0.022138791158795357, ref_abs_avg=17.196731567382812, test_abs_avg=17.200164794921875
production_forward grad[65] vs paper_forward: mean_abs=0.29975461959838867, max_abs=1.5185546875, mean_rel=0.16587978601455688, max_rel=40.895225524902344, norm_rel=0.02231372706592083, ref_abs_avg=13.649078369140625, test_abs_avg=13.662213325500488
production_forward grad[66] vs paper_forward: mean_abs=0.3691025674343109, max_abs=2.59375, mean_rel=0.13897275924682617, max_rel=565.573486328125, norm_rel=0.02182605117559433, ref_abs_avg=16.889799118041992, test_abs_avg=16.889562606811523
production_forward grad[67] vs paper_forward: mean_abs=0.3605814576148987, max_abs=2.59375, mean_rel=0.13321197032928467, max_rel=439.8515319824219, norm_rel=0.02138453535735607, ref_abs_avg=16.89249038696289, test_abs_avg=16.892704010009766
production_forward grad[68] vs paper_forward: mean_abs=0.2761101722717285, max_abs=1.25, mean_rel=0.08400483429431915, max_rel=4.887703895568848, norm_rel=0.02111070789396763, ref_abs_avg=12.985993385314941, test_abs_avg=12.98666763305664
production_forward grad[69] vs paper_forward: mean_abs=0.3532198667526245, max_abs=2.5, mean_rel=0.13672903180122375, max_rel=530.3258666992188, norm_rel=0.021421968936920166, ref_abs_avg=16.481361389160156, test_abs_avg=16.480127334594727
production_forward grad[70] vs paper_forward: mean_abs=0.348577618598938, max_abs=2.375, mean_rel=0.1387673318386078, max_rel=718.9244995117188, norm_rel=0.021311845630407333, ref_abs_avg=16.345840454101562, test_abs_avg=16.3519287109375
production_forward grad[71] vs paper_forward: mean_abs=0.2728252410888672, max_abs=1.0625, mean_rel=0.10367824137210846, max_rel=10.864632606506348, norm_rel=0.021552087739109993, ref_abs_avg=12.604347229003906, test_abs_avg=12.58668041229248
production_forward grad[72] vs paper_forward: mean_abs=0.3366498649120331, max_abs=2.75, mean_rel=0.13511116802692413, max_rel=423.0455627441406, norm_rel=0.02150346152484417, ref_abs_avg=15.65849494934082, test_abs_avg=15.656557083129883
production_forward grad[73] vs paper_forward: mean_abs=0.3308892548084259, max_abs=2.4296875, mean_rel=0.13364267349243164, max_rel=492.3649597167969, norm_rel=0.02083108387887478, ref_abs_avg=15.876197814941406, test_abs_avg=15.874269485473633
production_forward grad[74] vs paper_forward: mean_abs=0.31555071473121643, max_abs=1.1875, mean_rel=0.22296810150146484, max_rel=68.05743408203125, norm_rel=0.022258976474404335, ref_abs_avg=14.351237297058105, test_abs_avg=14.341768264770508
production_forward grad[75] vs paper_forward: mean_abs=0.3648959994316101, max_abs=3.375, mean_rel=0.1481160819530487, max_rel=760.3391723632812, norm_rel=0.022504473105072975, ref_abs_avg=16.207693099975586, test_abs_avg=16.206645965576172
production_forward grad[76] vs paper_forward: mean_abs=0.3617282509803772, max_abs=3.21875, mean_rel=0.1379612684249878, max_rel=439.8065490722656, norm_rel=0.02241458185017109, ref_abs_avg=16.18167495727539, test_abs_avg=16.17658233642578
production_forward grad[77] vs paper_forward: mean_abs=0.2806135416030884, max_abs=1.052734375, mean_rel=0.15310560166835785, max_rel=34.0113410949707, norm_rel=0.022568756714463234, ref_abs_avg=12.452445983886719, test_abs_avg=12.464801788330078
production_forward grad[78] vs paper_forward: mean_abs=0.34148919582366943, max_abs=3.6875, mean_rel=0.13394442200660706, max_rel=686.8477172851562, norm_rel=0.02199937216937542, ref_abs_avg=15.500965118408203, test_abs_avg=15.500530242919922
production_forward grad[79] vs paper_forward: mean_abs=0.33348947763442993, max_abs=2.75, mean_rel=0.1329399198293686, max_rel=927.581298828125, norm_rel=0.0219276025891304, ref_abs_avg=15.218202590942383, test_abs_avg=15.222803115844727
production_forward grad[80] vs paper_forward: mean_abs=0.28100770711898804, max_abs=1.0625, mean_rel=0.09927679598331451, max_rel=7.699604511260986, norm_rel=0.023123381659388542, ref_abs_avg=12.167888641357422, test_abs_avg=12.184385299682617
production_forward grad[81] vs paper_forward: mean_abs=0.315113365650177, max_abs=2.625, mean_rel=0.13661769032478333, max_rel=732.2112426757812, norm_rel=0.02138868346810341, ref_abs_avg=14.755105972290039, test_abs_avg=14.754434585571289
production_forward grad[82] vs paper_forward: mean_abs=0.31337860226631165, max_abs=2.25, mean_rel=0.13452288508415222, max_rel=818.004638671875, norm_rel=0.021379955112934113, ref_abs_avg=14.709151268005371, test_abs_avg=14.70486831665039
production_forward grad[83] vs paper_forward: mean_abs=0.2584528923034668, max_abs=0.9375, mean_rel=0.08474913239479065, max_rel=3.250622272491455, norm_rel=0.022298842668533325, ref_abs_avg=11.660294532775879, test_abs_avg=11.63685417175293
production_forward grad[84] vs paper_forward: mean_abs=0.3027445077896118, max_abs=2.875, mean_rel=0.133564293384552, max_rel=658.6387939453125, norm_rel=0.021157734096050262, ref_abs_avg=14.349746704101562, test_abs_avg=14.349100112915039
production_forward grad[85] vs paper_forward: mean_abs=0.29387974739074707, max_abs=2.0, mean_rel=0.12440349161624908, max_rel=396.27447509765625, norm_rel=0.020469430834054947, ref_abs_avg=14.331979751586914, test_abs_avg=14.330793380737305
production_forward grad[86] vs paper_forward: mean_abs=0.2423722743988037, max_abs=1.0, mean_rel=0.07417751103639603, max_rel=6.633606910705566, norm_rel=0.020076649263501167, ref_abs_avg=12.2567138671875, test_abs_avg=12.255118370056152
production_forward grad[87] vs paper_forward: mean_abs=0.27909985184669495, max_abs=2.375, mean_rel=0.12780524790287018, max_rel=752.2216186523438, norm_rel=0.02025732584297657, ref_abs_avg=13.819620132446289, test_abs_avg=13.818187713623047
production_forward grad[88] vs paper_forward: mean_abs=0.2718719244003296, max_abs=2.1328125, mean_rel=0.1338009238243103, max_rel=628.02099609375, norm_rel=0.020100681111216545, ref_abs_avg=13.634992599487305, test_abs_avg=13.637357711791992
production_forward grad[89] vs paper_forward: mean_abs=0.22626781463623047, max_abs=0.71875, mean_rel=0.08071447908878326, max_rel=8.100889205932617, norm_rel=0.02092125080525875, ref_abs_avg=10.815873146057129, test_abs_avg=10.814376831054688
production_forward grad[90] vs paper_forward: mean_abs=0.26273447275161743, max_abs=2.3125, mean_rel=0.1229664534330368, max_rel=429.2119140625, norm_rel=0.020095106214284897, ref_abs_avg=13.171104431152344, test_abs_avg=13.170129776000977
production_forward grad[91] vs paper_forward: mean_abs=0.2610563039779663, max_abs=2.125, mean_rel=0.11828038096427917, max_rel=682.0960083007812, norm_rel=0.020213935524225235, ref_abs_avg=13.026773452758789, test_abs_avg=13.023601531982422
production_forward grad[92] vs paper_forward: mean_abs=0.19590020179748535, max_abs=1.125, mean_rel=0.06389442086219788, max_rel=2.322410821914673, norm_rel=0.018886663019657135, ref_abs_avg=10.704524993896484, test_abs_avg=10.685297012329102
production_forward grad[93] vs paper_forward: mean_abs=0.2489166259765625, max_abs=2.5, mean_rel=0.12028965353965759, max_rel=633.731689453125, norm_rel=0.01943877339363098, ref_abs_avg=12.943321228027344, test_abs_avg=12.94217300415039
production_forward grad[94] vs paper_forward: mean_abs=0.23897254467010498, max_abs=1.96875, mean_rel=0.1171540915966034, max_rel=467.3997802734375, norm_rel=0.01925397478044033, ref_abs_avg=12.554539680480957, test_abs_avg=12.556604385375977
production_forward grad[95] vs paper_forward: mean_abs=0.20418429374694824, max_abs=0.75, mean_rel=0.06834855675697327, max_rel=6.898824214935303, norm_rel=0.020066531375050545, ref_abs_avg=10.338920593261719, test_abs_avg=10.34323787689209
production_forward grad[96] vs paper_forward: mean_abs=0.2378164529800415, max_abs=2.5, mean_rel=0.11725842952728271, max_rel=778.1492919921875, norm_rel=0.019280336797237396, ref_abs_avg=12.53325080871582, test_abs_avg=12.531765937805176
production_forward grad[97] vs paper_forward: mean_abs=0.22856876254081726, max_abs=2.25, mean_rel=0.11085711419582367, max_rel=346.671630859375, norm_rel=0.018617134541273117, ref_abs_avg=12.405479431152344, test_abs_avg=12.409923553466797
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016283816657960415, max_abs=0.03515625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00848397333174944, max_abs=0.40625, mean_rel=0.07389813661575317, max_rel=83.27948760986328, norm_rel=0.020062319934368134, ref_abs_avg=0.4537889361381531, test_abs_avg=0.45378559827804565
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.079477310180664, max_abs=64.0, mean_rel=0.14204387366771698, max_rel=311.9947204589844, norm_rel=0.02014094404876232, ref_abs_avg=222.54330444335938, test_abs_avg=222.52291870117188
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.8815515637397766, max_abs=3.5, mean_rel=0.101199209690094, max_rel=6.015352725982666, norm_rel=0.024003855884075165, ref_abs_avg=37.14860534667969, test_abs_avg=37.15755081176758
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.069725513458252, max_abs=6.75, mean_rel=0.17261646687984467, max_rel=2634.08935546875, norm_rel=0.02335135079920292, ref_abs_avg=46.08859634399414, test_abs_avg=46.08836364746094
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0328037738800049, max_abs=6.5, mean_rel=0.17440451681613922, max_rel=1482.1427001953125, norm_rel=0.0228768028318882, ref_abs_avg=45.374664306640625, test_abs_avg=45.3686408996582
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7258548736572266, max_abs=3.5, mean_rel=0.06932375580072403, max_rel=4.581066131591797, norm_rel=0.022802473977208138, ref_abs_avg=33.188201904296875, test_abs_avg=33.238006591796875
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9245703220367432, max_abs=6.0, mean_rel=0.17498494684696198, max_rel=2149.394287109375, norm_rel=0.022969812154769897, ref_abs_avg=40.45201110839844, test_abs_avg=40.4514274597168
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.8987553119659424, max_abs=5.25, mean_rel=0.1629161834716797, max_rel=1056.3900146484375, norm_rel=0.022516343742609024, ref_abs_avg=40.12800598144531, test_abs_avg=40.129730224609375
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7335672378540039, max_abs=2.75, mean_rel=0.0913703665137291, max_rel=5.3653154373168945, norm_rel=0.022850602865219116, ref_abs_avg=32.132957458496094, test_abs_avg=32.128536224365234
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8451157808303833, max_abs=5.875, mean_rel=0.16069167852401733, max_rel=1774.0731201171875, norm_rel=0.02288409136235714, ref_abs_avg=37.146080017089844, test_abs_avg=37.14650344848633
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8241180777549744, max_abs=5.25, mean_rel=0.1478424072265625, max_rel=1146.55224609375, norm_rel=0.022576628252863884, ref_abs_avg=36.69757843017578, test_abs_avg=36.69458770751953
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.635310173034668, max_abs=2.875, mean_rel=0.12788516283035278, max_rel=33.22761535644531, norm_rel=0.02331412211060524, ref_abs_avg=27.578472137451172, test_abs_avg=27.57797622680664
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.7823433876037598, max_abs=5.3359375, mean_rel=0.15118178725242615, max_rel=1017.1082763671875, norm_rel=0.022660907357931137, ref_abs_avg=34.70658874511719, test_abs_avg=34.70842742919922
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7643887996673584, max_abs=5.25, mean_rel=0.1530466079711914, max_rel=1585.706298828125, norm_rel=0.022304825484752655, ref_abs_avg=34.41774368286133, test_abs_avg=34.415985107421875
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6011476516723633, max_abs=2.25, mean_rel=1.1180063486099243, max_rel=522.3850708007812, norm_rel=0.023539867252111435, ref_abs_avg=26.16452407836914, test_abs_avg=26.159793853759766
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7266237735748291, max_abs=4.5, mean_rel=0.15491046011447906, max_rel=1996.394775390625, norm_rel=0.02235172875225544, ref_abs_avg=32.700439453125, test_abs_avg=32.701202392578125
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7153161764144897, max_abs=4.5, mean_rel=0.16282255947589874, max_rel=1191.54150390625, norm_rel=0.02227691002190113, ref_abs_avg=32.27992248535156, test_abs_avg=32.281761169433594
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5796279907226562, max_abs=2.5, mean_rel=0.07405610382556915, max_rel=3.3541035652160645, norm_rel=0.024207288399338722, ref_abs_avg=24.31856346130371, test_abs_avg=24.31570053100586
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.6848890781402588, max_abs=4.375, mean_rel=0.15622001886367798, max_rel=1240.1435546875, norm_rel=0.02221197634935379, ref_abs_avg=30.981300354003906, test_abs_avg=30.98271942138672
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6658854484558105, max_abs=3.89453125, mean_rel=0.14920958876609802, max_rel=715.947509765625, norm_rel=0.021983584389090538, ref_abs_avg=30.419719696044922, test_abs_avg=30.416549682617188
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5183310508728027, max_abs=2.15625, mean_rel=0.09633295983076096, max_rel=3.416670322418213, norm_rel=0.021710006520152092, ref_abs_avg=23.747114181518555, test_abs_avg=23.73535919189453
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6473187804222107, max_abs=4.125, mean_rel=0.14721272885799408, max_rel=1609.5950927734375, norm_rel=0.022170040756464005, ref_abs_avg=29.34557342529297, test_abs_avg=29.34857940673828
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6300915479660034, max_abs=4.75, mean_rel=0.1624934822320938, max_rel=1322.567626953125, norm_rel=0.021838704124093056, ref_abs_avg=29.026811599731445, test_abs_avg=29.03119468688965
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.4979133605957031, max_abs=1.75, mean_rel=0.06771904230117798, max_rel=1.987809181213379, norm_rel=0.021632401272654533, ref_abs_avg=23.125001907348633, test_abs_avg=23.09671974182129
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6131114363670349, max_abs=4.0, mean_rel=0.1487126648426056, max_rel=870.8230590820312, norm_rel=0.021981287747621536, ref_abs_avg=28.016029357910156, test_abs_avg=28.014699935913086
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.5983808040618896, max_abs=3.47998046875, mean_rel=0.13781951367855072, max_rel=742.0625, norm_rel=0.02156941592693329, ref_abs_avg=27.88187599182129, test_abs_avg=27.87932014465332
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.5699636936187744, max_abs=2.25, mean_rel=0.11576542258262634, max_rel=13.262027740478516, norm_rel=0.023284293711185455, ref_abs_avg=24.790836334228516, test_abs_avg=24.756977081298828
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7136813998222351, max_abs=4.5, mean_rel=0.15352272987365723, max_rel=1164.7308349609375, norm_rel=0.023735173046588898, ref_abs_avg=30.147903442382812, test_abs_avg=30.146596908569336
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.6990597248077393, max_abs=4.75, mean_rel=0.15677087008953094, max_rel=776.3456420898438, norm_rel=0.023621106520295143, ref_abs_avg=29.765010833740234, test_abs_avg=29.76264190673828
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5447683334350586, max_abs=3.125, mean_rel=0.10804412513971329, max_rel=5.735448360443115, norm_rel=0.024945471435785294, ref_abs_avg=22.038625717163086, test_abs_avg=22.037761688232422
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6636924743652344, max_abs=4.0, mean_rel=0.16638195514678955, max_rel=1566.0673828125, norm_rel=0.024173548445105553, ref_abs_avg=27.573139190673828, test_abs_avg=27.57185935974121
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6507371664047241, max_abs=4.6875, mean_rel=0.1634265035390854, max_rel=603.9747924804688, norm_rel=0.024104958400130272, ref_abs_avg=27.074493408203125, test_abs_avg=27.07421875
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.48596346378326416, max_abs=1.84375, mean_rel=0.1463555097579956, max_rel=19.44321632385254, norm_rel=0.024961581453680992, ref_abs_avg=20.309398651123047, test_abs_avg=20.30998992919922
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6175941228866577, max_abs=4.0, mean_rel=0.15820488333702087, max_rel=923.1798706054688, norm_rel=0.023930728435516357, ref_abs_avg=25.884750366210938, test_abs_avg=25.882369995117188
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6054277420043945, max_abs=4.25, mean_rel=0.15117548406124115, max_rel=833.9871826171875, norm_rel=0.023789243772625923, ref_abs_avg=25.563133239746094, test_abs_avg=25.55692481994629
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.48318731784820557, max_abs=2.0, mean_rel=0.3149494528770447, max_rel=106.1025619506836, norm_rel=0.02401638776063919, ref_abs_avg=20.06636619567871, test_abs_avg=20.106218338012695
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.5751652121543884, max_abs=3.75, mean_rel=0.14879924058914185, max_rel=1645.7745361328125, norm_rel=0.02360265888273716, ref_abs_avg=24.437381744384766, test_abs_avg=24.436504364013672
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5675090551376343, max_abs=3.296875, mean_rel=0.1520029753446579, max_rel=1127.7305908203125, norm_rel=0.02350500412285328, ref_abs_avg=24.215129852294922, test_abs_avg=24.215208053588867
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.44484448432922363, max_abs=1.875, mean_rel=0.16963542997837067, max_rel=54.00840377807617, norm_rel=0.024607958272099495, ref_abs_avg=18.66110610961914, test_abs_avg=18.661293029785156
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5415152907371521, max_abs=3.5, mean_rel=0.16021738946437836, max_rel=809.1257934570312, norm_rel=0.023498347029089928, ref_abs_avg=23.1020565032959, test_abs_avg=23.10059928894043
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5311359167098999, max_abs=3.5, mean_rel=0.14461345970630646, max_rel=606.7835083007812, norm_rel=0.023189278319478035, ref_abs_avg=23.006858825683594, test_abs_avg=23.005752563476562
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4314918518066406, max_abs=1.5, mean_rel=0.09953340888023376, max_rel=7.689526081085205, norm_rel=0.02388203702867031, ref_abs_avg=17.901836395263672, test_abs_avg=17.917633056640625
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5158868432044983, max_abs=3.25, mean_rel=0.14689283072948456, max_rel=1076.780517578125, norm_rel=0.02297564409673214, ref_abs_avg=22.475770950317383, test_abs_avg=22.47514533996582
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5044975280761719, max_abs=3.046875, mean_rel=0.14567629992961884, max_rel=682.8495483398438, norm_rel=0.02281852439045906, ref_abs_avg=22.21294403076172, test_abs_avg=22.213871002197266
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.3959110975265503, max_abs=1.6875, mean_rel=0.14971011877059937, max_rel=34.7987060546875, norm_rel=0.02164730802178383, ref_abs_avg=18.05931854248047, test_abs_avg=18.081275939941406
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.48671954870224, max_abs=3.125, mean_rel=0.16024349629878998, max_rel=1839.221435546875, norm_rel=0.022855814546346664, ref_abs_avg=21.324180603027344, test_abs_avg=21.32373046875
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.47715792059898376, max_abs=3.5, mean_rel=0.14760059118270874, max_rel=858.7349853515625, norm_rel=0.02277374640107155, ref_abs_avg=21.010133743286133, test_abs_avg=21.00782012939453
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.38222360610961914, max_abs=1.625, mean_rel=0.10018685460090637, max_rel=14.237727165222168, norm_rel=0.022051874548196793, ref_abs_avg=17.265851974487305, test_abs_avg=17.27002716064453
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.4665544033050537, max_abs=3.5, mean_rel=0.15062260627746582, max_rel=1031.4945068359375, norm_rel=0.02254619263112545, ref_abs_avg=20.722787857055664, test_abs_avg=20.722232818603516
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.4574379324913025, max_abs=2.671875, mean_rel=0.14519469439983368, max_rel=454.11492919921875, norm_rel=0.022245699539780617, ref_abs_avg=20.58928871154785, test_abs_avg=20.584693908691406
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4313829839229584, max_abs=1.901123046875, mean_rel=0.26802515983581543, max_rel=50.8635139465332, norm_rel=0.025369422510266304, ref_abs_avg=17.137943267822266, test_abs_avg=17.12428092956543
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5160016417503357, max_abs=4.0, mean_rel=0.15654903650283813, max_rel=831.8032836914062, norm_rel=0.02429266646504402, ref_abs_avg=21.321224212646484, test_abs_avg=21.321388244628906
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5058599710464478, max_abs=3.5, mean_rel=0.16170024871826172, max_rel=1103.0677490234375, norm_rel=0.02399718388915062, ref_abs_avg=21.113983154296875, test_abs_avg=21.11522674560547
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.3708573281764984, max_abs=1.8125, mean_rel=0.8343790173530579, max_rel=377.2081604003906, norm_rel=0.02349495142698288, ref_abs_avg=16.23176383972168, test_abs_avg=16.203678131103516
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.4755888879299164, max_abs=3.0, mean_rel=0.14923128485679626, max_rel=1106.7982177734375, norm_rel=0.02385294996201992, ref_abs_avg=19.953784942626953, test_abs_avg=19.9541015625
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.4716697931289673, max_abs=3.25, mean_rel=0.1599346399307251, max_rel=952.294921875, norm_rel=0.02403489500284195, ref_abs_avg=19.657779693603516, test_abs_avg=19.652854919433594
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.3632808327674866, max_abs=1.3125, mean_rel=0.1978697031736374, max_rel=22.464866638183594, norm_rel=0.022559480741620064, ref_abs_avg=16.088924407958984, test_abs_avg=16.080486297607422
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.4455558955669403, max_abs=3.333984375, mean_rel=0.15660911798477173, max_rel=989.9849853515625, norm_rel=0.023379966616630554, ref_abs_avg=19.05257225036621, test_abs_avg=19.051362991333008
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.43778228759765625, max_abs=2.78125, mean_rel=0.15652690827846527, max_rel=1208.49755859375, norm_rel=0.023307187482714653, ref_abs_avg=18.80270767211914, test_abs_avg=18.803125381469727
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.36246350407600403, max_abs=1.375, mean_rel=0.21856316924095154, max_rel=61.28061294555664, norm_rel=0.024233628064393997, ref_abs_avg=14.636616706848145, test_abs_avg=14.640766143798828
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.41485339403152466, max_abs=3.5, mean_rel=0.15080919861793518, max_rel=622.29345703125, norm_rel=0.023095449432730675, ref_abs_avg=17.970178604125977, test_abs_avg=17.967327117919922
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4086951017379761, max_abs=2.75, mean_rel=0.137870654463768, max_rel=438.5208435058594, norm_rel=0.02238696813583374, ref_abs_avg=18.24830436706543, test_abs_avg=18.250656127929688
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3298625349998474, max_abs=1.375, mean_rel=0.1593315303325653, max_rel=29.59557342529297, norm_rel=0.022400492802262306, ref_abs_avg=14.903053283691406, test_abs_avg=14.914871215820312
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.39704060554504395, max_abs=2.75, mean_rel=0.148964986205101, max_rel=895.2156372070312, norm_rel=0.022386154159903526, ref_abs_avg=17.71542739868164, test_abs_avg=17.714492797851562
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.3848497271537781, max_abs=2.625, mean_rel=0.14456580579280853, max_rel=698.4586181640625, norm_rel=0.02243940532207489, ref_abs_avg=17.196731567382812, test_abs_avg=17.201805114746094
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3127784729003906, max_abs=1.3154296875, mean_rel=0.16601336002349854, max_rel=35.42499542236328, norm_rel=0.022919274866580963, ref_abs_avg=13.649078369140625, test_abs_avg=13.656578063964844
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.3731159567832947, max_abs=2.625, mean_rel=0.14167895913124084, max_rel=848.0390625, norm_rel=0.022066958248615265, ref_abs_avg=16.889799118041992, test_abs_avg=16.888614654541016
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3652210831642151, max_abs=2.53125, mean_rel=0.13581478595733643, max_rel=465.69049072265625, norm_rel=0.021634405478835106, ref_abs_avg=16.89249038696289, test_abs_avg=16.893882751464844
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.27436017990112305, max_abs=1.125, mean_rel=0.07758598029613495, max_rel=5.4734206199646, norm_rel=0.021099185571074486, ref_abs_avg=12.985993385314941, test_abs_avg=12.989591598510742
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.3563781678676605, max_abs=3.0, mean_rel=0.13722215592861176, max_rel=523.8400268554688, norm_rel=0.021611077710986137, ref_abs_avg=16.481361389160156, test_abs_avg=16.480022430419922
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.34986412525177, max_abs=2.375, mean_rel=0.1403021514415741, max_rel=612.8687744140625, norm_rel=0.02136453241109848, ref_abs_avg=16.345840454101562, test_abs_avg=16.351057052612305
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.2844719886779785, max_abs=1.125, mean_rel=0.1106506735086441, max_rel=9.784111976623535, norm_rel=0.02274511195719242, ref_abs_avg=12.604347229003906, test_abs_avg=12.600098609924316
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3394320011138916, max_abs=2.75, mean_rel=0.13441409170627594, max_rel=474.8492126464844, norm_rel=0.021673573181033134, ref_abs_avg=15.65849494934082, test_abs_avg=15.656291961669922
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3330342173576355, max_abs=2.6796875, mean_rel=0.13364923000335693, max_rel=488.9490966796875, norm_rel=0.020965401083230972, ref_abs_avg=15.876197814941406, test_abs_avg=15.873356819152832
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.3221972584724426, max_abs=1.25, mean_rel=0.35720980167388916, max_rel=133.07188415527344, norm_rel=0.02257516235113144, ref_abs_avg=14.351237297058105, test_abs_avg=14.344377517700195
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.3692132234573364, max_abs=3.484375, mean_rel=0.15081501007080078, max_rel=760.3391723632812, norm_rel=0.02277289144694805, ref_abs_avg=16.207693099975586, test_abs_avg=16.206157684326172
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3645002841949463, max_abs=2.625, mean_rel=0.1389484852552414, max_rel=390.8298034667969, norm_rel=0.02256244234740734, ref_abs_avg=16.18167495727539, test_abs_avg=16.1761417388916
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.2882157564163208, max_abs=1.115234375, mean_rel=0.1730237901210785, max_rel=43.1603889465332, norm_rel=0.022722484543919563, ref_abs_avg=12.452445983886719, test_abs_avg=12.458854675292969
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.34510162472724915, max_abs=3.375, mean_rel=0.13388338685035706, max_rel=736.7177734375, norm_rel=0.022235963493585587, ref_abs_avg=15.500965118408203, test_abs_avg=15.500776290893555
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.33739107847213745, max_abs=3.0, mean_rel=0.1386672556400299, max_rel=927.581298828125, norm_rel=0.02216622233390808, ref_abs_avg=15.218202590942383, test_abs_avg=15.222743034362793
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.2817873954772949, max_abs=1.0, mean_rel=0.09921985864639282, max_rel=8.062888145446777, norm_rel=0.023039568215608597, ref_abs_avg=12.167888641357422, test_abs_avg=12.183786392211914
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.3184250593185425, max_abs=2.78125, mean_rel=0.13826419413089752, max_rel=790.4376220703125, norm_rel=0.021603388711810112, ref_abs_avg=14.755105972290039, test_abs_avg=14.754354476928711
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.31853270530700684, max_abs=2.875, mean_rel=0.13814955949783325, max_rel=936.7698364257812, norm_rel=0.02172689139842987, ref_abs_avg=14.709151268005371, test_abs_avg=14.706257820129395
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.26988983154296875, max_abs=1.03125, mean_rel=0.085930734872818, max_rel=3.594883680343628, norm_rel=0.022764934226870537, ref_abs_avg=11.660294532775879, test_abs_avg=11.640145301818848
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.3054027557373047, max_abs=2.875, mean_rel=0.1327122002840042, max_rel=582.4926147460938, norm_rel=0.021339960396289825, ref_abs_avg=14.349746704101562, test_abs_avg=14.348804473876953
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.29306405782699585, max_abs=2.25, mean_rel=0.12464461475610733, max_rel=358.36541748046875, norm_rel=0.02040438912808895, ref_abs_avg=14.331979751586914, test_abs_avg=14.330150604248047
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.24520254135131836, max_abs=1.0625, mean_rel=0.07563990354537964, max_rel=4.591165542602539, norm_rel=0.020372526720166206, ref_abs_avg=12.2567138671875, test_abs_avg=12.2632474899292
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.28135210275650024, max_abs=2.25, mean_rel=0.13010334968566895, max_rel=573.845458984375, norm_rel=0.02042144536972046, ref_abs_avg=13.819620132446289, test_abs_avg=13.817811965942383
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.27372288703918457, max_abs=2.421875, mean_rel=0.13311371207237244, max_rel=698.9734497070312, norm_rel=0.020251423120498657, ref_abs_avg=13.634992599487305, test_abs_avg=13.637785911560059
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.2275681495666504, max_abs=0.8046875, mean_rel=0.08595984429121017, max_rel=8.676668167114258, norm_rel=0.021281937137246132, ref_abs_avg=10.815873146057129, test_abs_avg=10.816444396972656
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2638888359069824, max_abs=2.5, mean_rel=0.12377621233463287, max_rel=656.968017578125, norm_rel=0.020180141553282738, ref_abs_avg=13.171104431152344, test_abs_avg=13.170455932617188
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.25873488187789917, max_abs=2.125, mean_rel=0.11557748913764954, max_rel=537.16357421875, norm_rel=0.02001899853348732, ref_abs_avg=13.026773452758789, test_abs_avg=13.023213386535645
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.20167970657348633, max_abs=1.1875, mean_rel=0.0659133642911911, max_rel=1.8898382186889648, norm_rel=0.019473295658826828, ref_abs_avg=10.704524993896484, test_abs_avg=10.703777313232422
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.24985480308532715, max_abs=3.0, mean_rel=0.12109386175870895, max_rel=707.655517578125, norm_rel=0.019500289112329483, ref_abs_avg=12.943321228027344, test_abs_avg=12.942630767822266
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.2419455647468567, max_abs=2.0, mean_rel=0.11630145460367203, max_rel=532.4091186523438, norm_rel=0.019516779109835625, ref_abs_avg=12.554539680480957, test_abs_avg=12.557957649230957
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.20760226249694824, max_abs=0.765625, mean_rel=0.07433915138244629, max_rel=8.907276153564453, norm_rel=0.020141607150435448, ref_abs_avg=10.338920593261719, test_abs_avg=10.352738380432129
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.23818042874336243, max_abs=3.0, mean_rel=0.117489293217659, max_rel=612.181396484375, norm_rel=0.019318506121635437, ref_abs_avg=12.53325080871582, test_abs_avg=12.53187370300293
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.22986088693141937, max_abs=2.40625, mean_rel=0.1138424202799797, max_rel=533.5115356445312, norm_rel=0.018726376816630363, ref_abs_avg=12.405479431152344, test_abs_avg=12.409324645996094
liger_forward vs paper_forward output: mean_abs=0.00014408581773750484, max_abs=0.0234375
liger_forward grad[0] vs paper_forward: mean_abs=0.003473318414762616, max_abs=0.18359375, mean_rel=0.025382976979017258, max_rel=75.57176208496094, norm_rel=0.009522144682705402, ref_abs_avg=0.4537889361381531, test_abs_avg=0.4537685513496399
liger_forward grad[1] vs paper_forward: mean_abs=1.4939056634902954, max_abs=16.0, mean_rel=0.053358808159828186, max_rel=175.07652282714844, norm_rel=0.006362811662256718, ref_abs_avg=222.54330444335938, test_abs_avg=222.5368194580078
liger_forward grad[2] vs paper_forward: mean_abs=0.3357664942741394, max_abs=1.0625, mean_rel=0.25771912932395935, max_rel=117.33737182617188, norm_rel=0.009128561243414879, ref_abs_avg=37.14860534667969, test_abs_avg=37.12187194824219
liger_forward grad[3] vs paper_forward: mean_abs=0.38495105504989624, max_abs=3.0, mean_rel=0.06018799915909767, max_rel=849.8007202148438, norm_rel=0.008688055910170078, ref_abs_avg=46.08859634399414, test_abs_avg=46.089454650878906
liger_forward grad[4] vs paper_forward: mean_abs=0.36762917041778564, max_abs=2.5, mean_rel=0.05851609259843826, max_rel=294.14678955078125, norm_rel=0.008443697355687618, ref_abs_avg=45.374664306640625, test_abs_avg=45.37431335449219
liger_forward grad[5] vs paper_forward: mean_abs=0.25991249084472656, max_abs=1.0, mean_rel=0.027499079704284668, max_rel=3.1387267112731934, norm_rel=0.008130406960844994, ref_abs_avg=33.188201904296875, test_abs_avg=33.212974548339844
liger_forward grad[6] vs paper_forward: mean_abs=0.32647624611854553, max_abs=2.0, mean_rel=0.055401433259248734, max_rel=516.2566528320312, norm_rel=0.008408427238464355, ref_abs_avg=40.45201110839844, test_abs_avg=40.45263671875
liger_forward grad[7] vs paper_forward: mean_abs=0.3157121241092682, max_abs=2.25, mean_rel=0.05519389361143112, max_rel=717.1696166992188, norm_rel=0.008214841596782207, ref_abs_avg=40.12800598144531, test_abs_avg=40.126426696777344
liger_forward grad[8] vs paper_forward: mean_abs=0.24693012237548828, max_abs=1.0, mean_rel=0.036719292402267456, max_rel=6.268547534942627, norm_rel=0.008409191854298115, ref_abs_avg=32.132957458496094, test_abs_avg=32.12895965576172
liger_forward grad[9] vs paper_forward: mean_abs=0.2949385643005371, max_abs=2.0, mean_rel=0.05502670258283615, max_rel=524.1376953125, norm_rel=0.008271101862192154, ref_abs_avg=37.146080017089844, test_abs_avg=37.146514892578125
liger_forward grad[10] vs paper_forward: mean_abs=0.2828608751296997, max_abs=2.0, mean_rel=0.050099894404411316, max_rel=193.1341552734375, norm_rel=0.008059423416852951, ref_abs_avg=36.69757843017578, test_abs_avg=36.6959114074707
liger_forward grad[11] vs paper_forward: mean_abs=0.22281980514526367, max_abs=1.03125, mean_rel=0.044896118342876434, max_rel=8.625431060791016, norm_rel=0.008467902429401875, ref_abs_avg=27.578472137451172, test_abs_avg=27.579776763916016
liger_forward grad[12] vs paper_forward: mean_abs=0.26978373527526855, max_abs=2.0, mean_rel=0.05496438965201378, max_rel=383.4307861328125, norm_rel=0.008118538185954094, ref_abs_avg=34.70658874511719, test_abs_avg=34.70724105834961
liger_forward grad[13] vs paper_forward: mean_abs=0.25855720043182373, max_abs=1.75, mean_rel=0.049747683107852936, max_rel=209.83108520507812, norm_rel=0.00785265676677227, ref_abs_avg=34.41774368286133, test_abs_avg=34.41733932495117
liger_forward grad[14] vs paper_forward: mean_abs=0.21421432495117188, max_abs=0.75, mean_rel=0.06338974833488464, max_rel=12.480189323425293, norm_rel=0.008588596247136593, ref_abs_avg=26.16452407836914, test_abs_avg=26.14598846435547
liger_forward grad[15] vs paper_forward: mean_abs=0.24746054410934448, max_abs=1.5, mean_rel=0.05089782923460007, max_rel=345.9400329589844, norm_rel=0.007922492921352386, ref_abs_avg=32.700439453125, test_abs_avg=32.69989776611328
liger_forward grad[16] vs paper_forward: mean_abs=0.23933422565460205, max_abs=1.5, mean_rel=0.05508826673030853, max_rel=360.27972412109375, norm_rel=0.007776884827762842, ref_abs_avg=32.27992248535156, test_abs_avg=32.281036376953125
liger_forward grad[17] vs paper_forward: mean_abs=0.19337892532348633, max_abs=0.7578125, mean_rel=0.03534688800573349, max_rel=2.3372981548309326, norm_rel=0.008352953940629959, ref_abs_avg=24.31856346130371, test_abs_avg=24.32460594177246
liger_forward grad[18] vs paper_forward: mean_abs=0.2310270071029663, max_abs=1.5, mean_rel=0.051671598106622696, max_rel=485.13726806640625, norm_rel=0.007812343072146177, ref_abs_avg=30.981300354003906, test_abs_avg=30.981159210205078
liger_forward grad[19] vs paper_forward: mean_abs=0.2214270830154419, max_abs=1.5, mean_rel=0.04905585199594498, max_rel=577.9451904296875, norm_rel=0.007656794972717762, ref_abs_avg=30.419719696044922, test_abs_avg=30.419673919677734
liger_forward grad[20] vs paper_forward: mean_abs=0.18702125549316406, max_abs=0.75, mean_rel=0.040059566497802734, max_rel=2.6827292442321777, norm_rel=0.007979300804436207, ref_abs_avg=23.747114181518555, test_abs_avg=23.72934341430664
liger_forward grad[21] vs paper_forward: mean_abs=0.21644523739814758, max_abs=1.5, mean_rel=0.0486101433634758, max_rel=301.790283203125, norm_rel=0.007734807673841715, ref_abs_avg=29.34557342529297, test_abs_avg=29.34609603881836
liger_forward grad[22] vs paper_forward: mean_abs=0.20751860737800598, max_abs=1.25, mean_rel=0.05580174922943115, max_rel=348.90234375, norm_rel=0.007524517364799976, ref_abs_avg=29.026811599731445, test_abs_avg=29.02643585205078
liger_forward grad[23] vs paper_forward: mean_abs=0.17150115966796875, max_abs=0.75, mean_rel=0.02470209077000618, max_rel=1.2552545070648193, norm_rel=0.007923761382699013, ref_abs_avg=23.125001907348633, test_abs_avg=23.123865127563477
liger_forward grad[24] vs paper_forward: mean_abs=0.20249255001544952, max_abs=1.25, mean_rel=0.051202110946178436, max_rel=260.4891052246094, norm_rel=0.007593340240418911, ref_abs_avg=28.016029357910156, test_abs_avg=28.016254425048828
liger_forward grad[25] vs paper_forward: mean_abs=0.1949097216129303, max_abs=1.5, mean_rel=0.04352743178606033, max_rel=235.7125701904297, norm_rel=0.007371507119387388, ref_abs_avg=27.88187599182129, test_abs_avg=27.881290435791016
liger_forward grad[26] vs paper_forward: mean_abs=0.1838817596435547, max_abs=0.75, mean_rel=0.02868734300136566, max_rel=2.3703181743621826, norm_rel=0.00787001010030508, ref_abs_avg=24.790836334228516, test_abs_avg=24.779598236083984
liger_forward grad[27] vs paper_forward: mean_abs=0.22186407446861267, max_abs=1.625, mean_rel=0.049312442541122437, max_rel=538.33056640625, norm_rel=0.007714861538261175, ref_abs_avg=30.147903442382812, test_abs_avg=30.148012161254883
liger_forward grad[28] vs paper_forward: mean_abs=0.2148260772228241, max_abs=1.5, mean_rel=0.04727093130350113, max_rel=350.23028564453125, norm_rel=0.007591994013637304, ref_abs_avg=29.765010833740234, test_abs_avg=29.764240264892578
liger_forward grad[29] vs paper_forward: mean_abs=0.16603565216064453, max_abs=0.75, mean_rel=0.041037704795598984, max_rel=4.277005672454834, norm_rel=0.007773521821945906, ref_abs_avg=22.038625717163086, test_abs_avg=22.025596618652344
liger_forward grad[30] vs paper_forward: mean_abs=0.19944657385349274, max_abs=1.25, mean_rel=0.046757377684116364, max_rel=210.42794799804688, norm_rel=0.007602155674248934, ref_abs_avg=27.573139190673828, test_abs_avg=27.573230743408203
liger_forward grad[31] vs paper_forward: mean_abs=0.19074033200740814, max_abs=1.125, mean_rel=0.04841756075620651, max_rel=399.4530029296875, norm_rel=0.007416740525513887, ref_abs_avg=27.074493408203125, test_abs_avg=27.075428009033203
liger_forward grad[32] vs paper_forward: mean_abs=0.15924733877182007, max_abs=0.75, mean_rel=0.050398558378219604, max_rel=5.6934919357299805, norm_rel=0.008311834186315536, ref_abs_avg=20.309398651123047, test_abs_avg=20.306516647338867
liger_forward grad[33] vs paper_forward: mean_abs=0.18271534144878387, max_abs=1.3125, mean_rel=0.04731915146112442, max_rel=264.5998229980469, norm_rel=0.007424552459269762, ref_abs_avg=25.884750366210938, test_abs_avg=25.884971618652344
liger_forward grad[34] vs paper_forward: mean_abs=0.17431631684303284, max_abs=1.1875, mean_rel=0.043289270251989365, max_rel=318.0856018066406, norm_rel=0.007207848597317934, ref_abs_avg=25.563133239746094, test_abs_avg=25.562210083007812
liger_forward grad[35] vs paper_forward: mean_abs=0.1415640115737915, max_abs=0.625, mean_rel=0.04096494987607002, max_rel=6.578947067260742, norm_rel=0.007301934529095888, ref_abs_avg=20.06636619567871, test_abs_avg=20.074129104614258
liger_forward grad[36] vs paper_forward: mean_abs=0.16898532211780548, max_abs=1.0625, mean_rel=0.04336579144001007, max_rel=349.9263000488281, norm_rel=0.007280521560460329, ref_abs_avg=24.437381744384766, test_abs_avg=24.437213897705078
liger_forward grad[37] vs paper_forward: mean_abs=0.16258664429187775, max_abs=1.0, mean_rel=0.04429389163851738, max_rel=298.5121765136719, norm_rel=0.0071120490320026875, ref_abs_avg=24.215129852294922, test_abs_avg=24.215084075927734
liger_forward grad[38] vs paper_forward: mean_abs=0.13128399848937988, max_abs=0.625, mean_rel=0.05771395564079285, max_rel=19.111074447631836, norm_rel=0.0074111721478402615, ref_abs_avg=18.66110610961914, test_abs_avg=18.664066314697266
liger_forward grad[39] vs paper_forward: mean_abs=0.15794044733047485, max_abs=1.125, mean_rel=0.045520007610321045, max_rel=163.24996948242188, norm_rel=0.0072073363699018955, ref_abs_avg=23.1020565032959, test_abs_avg=23.10173797607422
liger_forward grad[40] vs paper_forward: mean_abs=0.15149125456809998, max_abs=1.125, mean_rel=0.04132779687643051, max_rel=134.97789001464844, norm_rel=0.006980331614613533, ref_abs_avg=23.006858825683594, test_abs_avg=23.007665634155273
liger_forward grad[41] vs paper_forward: mean_abs=0.12476539611816406, max_abs=0.5, mean_rel=0.02221732586622238, max_rel=1.6765838861465454, norm_rel=0.007343478500843048, ref_abs_avg=17.901836395263672, test_abs_avg=17.896329879760742
liger_forward grad[42] vs paper_forward: mean_abs=0.14931617677211761, max_abs=1.03125, mean_rel=0.041232623159885406, max_rel=226.5622100830078, norm_rel=0.0070135039277374744, ref_abs_avg=22.475770950317383, test_abs_avg=22.475482940673828
liger_forward grad[43] vs paper_forward: mean_abs=0.14190012216567993, max_abs=1.0, mean_rel=0.04231669753789902, max_rel=241.83164978027344, norm_rel=0.00679742032662034, ref_abs_avg=22.21294403076172, test_abs_avg=22.212949752807617
liger_forward grad[44] vs paper_forward: mean_abs=0.112601637840271, max_abs=0.5, mean_rel=0.0480055958032608, max_rel=9.645551681518555, norm_rel=0.0066053117625415325, ref_abs_avg=18.05931854248047, test_abs_avg=18.056175231933594
liger_forward grad[45] vs paper_forward: mean_abs=0.1399211883544922, max_abs=1.0, mean_rel=0.044868238270282745, max_rel=293.09576416015625, norm_rel=0.006938362028449774, ref_abs_avg=21.324180603027344, test_abs_avg=21.324493408203125
liger_forward grad[46] vs paper_forward: mean_abs=0.13473281264305115, max_abs=1.0, mean_rel=0.04166017845273018, max_rel=242.0620880126953, norm_rel=0.006811639294028282, ref_abs_avg=21.010133743286133, test_abs_avg=21.010330200195312
liger_forward grad[47] vs paper_forward: mean_abs=0.10702276229858398, max_abs=0.4375, mean_rel=0.03331369906663895, max_rel=7.497584819793701, norm_rel=0.006482247728854418, ref_abs_avg=17.265851974487305, test_abs_avg=17.262447357177734
liger_forward grad[48] vs paper_forward: mean_abs=0.13344880938529968, max_abs=1.0, mean_rel=0.04258991777896881, max_rel=300.83013916015625, norm_rel=0.006820059847086668, ref_abs_avg=20.722787857055664, test_abs_avg=20.72305679321289
liger_forward grad[49] vs paper_forward: mean_abs=0.1278320997953415, max_abs=1.0, mean_rel=0.04123419150710106, max_rel=172.00692749023438, norm_rel=0.006609873380511999, ref_abs_avg=20.58928871154785, test_abs_avg=20.59088134765625
liger_forward grad[50] vs paper_forward: mean_abs=0.11695745587348938, max_abs=0.5, mean_rel=0.05423877760767937, max_rel=6.60092830657959, norm_rel=0.007095708046108484, ref_abs_avg=17.137943267822266, test_abs_avg=17.146177291870117
liger_forward grad[51] vs paper_forward: mean_abs=0.14805075526237488, max_abs=1.0, mean_rel=0.045156750828027725, max_rel=156.4462127685547, norm_rel=0.00732599338516593, ref_abs_avg=21.321224212646484, test_abs_avg=21.32106590270996
liger_forward grad[52] vs paper_forward: mean_abs=0.14289303123950958, max_abs=1.0, mean_rel=0.047179337590932846, max_rel=454.0195617675781, norm_rel=0.007154161110520363, ref_abs_avg=21.113983154296875, test_abs_avg=21.1148681640625
liger_forward grad[53] vs paper_forward: mean_abs=0.11309173703193665, max_abs=0.5, mean_rel=0.23834596574306488, max_rel=106.02172088623047, norm_rel=0.0072904424741864204, ref_abs_avg=16.23176383972168, test_abs_avg=16.220157623291016
liger_forward grad[54] vs paper_forward: mean_abs=0.13457340002059937, max_abs=1.0, mean_rel=0.04268606752157211, max_rel=209.61849975585938, norm_rel=0.00711051793769002, ref_abs_avg=19.953784942626953, test_abs_avg=19.953453063964844
liger_forward grad[55] vs paper_forward: mean_abs=0.13070446252822876, max_abs=1.0, mean_rel=0.04492678493261337, max_rel=200.3812713623047, norm_rel=0.007031317800283432, ref_abs_avg=19.657779693603516, test_abs_avg=19.657604217529297
liger_forward grad[56] vs paper_forward: mean_abs=0.09873485565185547, max_abs=0.5, mean_rel=0.07629047334194183, max_rel=10.100606918334961, norm_rel=0.006689611356705427, ref_abs_avg=16.088924407958984, test_abs_avg=16.087387084960938
liger_forward grad[57] vs paper_forward: mean_abs=0.12497086822986603, max_abs=1.0, mean_rel=0.045283593237400055, max_rel=242.06178283691406, norm_rel=0.006933513097465038, ref_abs_avg=19.05257225036621, test_abs_avg=19.052350997924805
liger_forward grad[58] vs paper_forward: mean_abs=0.1213148757815361, max_abs=0.75, mean_rel=0.04202500730752945, max_rel=135.87298583984375, norm_rel=0.006833561230450869, ref_abs_avg=18.80270767211914, test_abs_avg=18.803085327148438
liger_forward grad[59] vs paper_forward: mean_abs=0.09967902302742004, max_abs=0.375, mean_rel=0.1518259048461914, max_rel=64.09058380126953, norm_rel=0.0069862245582044125, ref_abs_avg=14.636616706848145, test_abs_avg=14.642145156860352
liger_forward grad[60] vs paper_forward: mean_abs=0.11671324819326401, max_abs=0.75, mean_rel=0.04266144707798958, max_rel=175.6488800048828, norm_rel=0.006866414099931717, ref_abs_avg=17.970178604125977, test_abs_avg=17.970191955566406
liger_forward grad[61] vs paper_forward: mean_abs=0.11301889270544052, max_abs=0.75, mean_rel=0.036936983466148376, max_rel=125.21045684814453, norm_rel=0.006590445525944233, ref_abs_avg=18.24830436706543, test_abs_avg=18.247285842895508
liger_forward grad[62] vs paper_forward: mean_abs=0.09509700536727905, max_abs=0.5, mean_rel=0.0384734645485878, max_rel=2.775804042816162, norm_rel=0.006691462825983763, ref_abs_avg=14.903053283691406, test_abs_avg=14.891745567321777
liger_forward grad[63] vs paper_forward: mean_abs=0.11095716059207916, max_abs=1.0, mean_rel=0.04092082381248474, max_rel=288.00616455078125, norm_rel=0.006636034697294235, ref_abs_avg=17.71542739868164, test_abs_avg=17.71514129638672
liger_forward grad[64] vs paper_forward: mean_abs=0.10682954639196396, max_abs=0.75, mean_rel=0.037517085671424866, max_rel=131.9080047607422, norm_rel=0.006618097890168428, ref_abs_avg=17.196731567382812, test_abs_avg=17.19558334350586
liger_forward grad[65] vs paper_forward: mean_abs=0.08337706327438354, max_abs=0.359375, mean_rel=0.032264888286590576, max_rel=2.4384396076202393, norm_rel=0.006361614912748337, ref_abs_avg=13.649078369140625, test_abs_avg=13.644380569458008
liger_forward grad[66] vs paper_forward: mean_abs=0.10475447028875351, max_abs=1.0, mean_rel=0.039610255509614944, max_rel=277.0379943847656, norm_rel=0.006580707151442766, ref_abs_avg=16.889799118041992, test_abs_avg=16.889636993408203
liger_forward grad[67] vs paper_forward: mean_abs=0.10133601725101471, max_abs=0.75, mean_rel=0.04023895040154457, max_rel=183.5209197998047, norm_rel=0.00640317564830184, ref_abs_avg=16.89249038696289, test_abs_avg=16.892131805419922
liger_forward grad[68] vs paper_forward: mean_abs=0.08309745788574219, max_abs=0.375, mean_rel=0.029524635523557663, max_rel=4.5119948387146, norm_rel=0.006614759098738432, ref_abs_avg=12.985993385314941, test_abs_avg=12.98654842376709
liger_forward grad[69] vs paper_forward: mean_abs=0.09978058934211731, max_abs=0.9375, mean_rel=0.03873031586408615, max_rel=175.571044921875, norm_rel=0.006441786419600248, ref_abs_avg=16.481361389160156, test_abs_avg=16.48099708557129
liger_forward grad[70] vs paper_forward: mean_abs=0.09598767012357712, max_abs=0.75, mean_rel=0.03760121390223503, max_rel=154.61741638183594, norm_rel=0.006281364243477583, ref_abs_avg=16.345840454101562, test_abs_avg=16.345504760742188
liger_forward grad[71] vs paper_forward: mean_abs=0.07583403587341309, max_abs=0.375, mean_rel=0.03308891877532005, max_rel=2.998444080352783, norm_rel=0.006503063719719648, ref_abs_avg=12.604347229003906, test_abs_avg=12.606130599975586
liger_forward grad[72] vs paper_forward: mean_abs=0.09433965384960175, max_abs=0.75, mean_rel=0.0385667160153389, max_rel=107.64192199707031, norm_rel=0.006419884506613016, ref_abs_avg=15.65849494934082, test_abs_avg=15.658404350280762
liger_forward grad[73] vs paper_forward: mean_abs=0.09156028926372528, max_abs=0.75, mean_rel=0.034825973212718964, max_rel=164.7210693359375, norm_rel=0.006190967280417681, ref_abs_avg=15.876197814941406, test_abs_avg=15.875755310058594
liger_forward grad[74] vs paper_forward: mean_abs=0.0893615186214447, max_abs=0.3125, mean_rel=0.07462063431739807, max_rel=14.455283164978027, norm_rel=0.006623114459216595, ref_abs_avg=14.351237297058105, test_abs_avg=14.350460052490234
liger_forward grad[75] vs paper_forward: mean_abs=0.10499492287635803, max_abs=1.0, mean_rel=0.042977988719940186, max_rel=231.91578674316406, norm_rel=0.0068501923233270645, ref_abs_avg=16.207693099975586, test_abs_avg=16.20697593688965
liger_forward grad[76] vs paper_forward: mean_abs=0.10138498246669769, max_abs=0.75, mean_rel=0.04135119169950485, max_rel=213.90562438964844, norm_rel=0.006667235400527716, ref_abs_avg=16.18167495727539, test_abs_avg=16.181005477905273
liger_forward grad[77] vs paper_forward: mean_abs=0.07925903797149658, max_abs=0.328125, mean_rel=0.02878507412970066, max_rel=2.246417284011841, norm_rel=0.006734625902026892, ref_abs_avg=12.452445983886719, test_abs_avg=12.448659896850586
liger_forward grad[78] vs paper_forward: mean_abs=0.09770965576171875, max_abs=0.8125, mean_rel=0.03892595320940018, max_rel=201.38111877441406, norm_rel=0.0066809323616325855, ref_abs_avg=15.500965118408203, test_abs_avg=15.50088882446289
liger_forward grad[79] vs paper_forward: mean_abs=0.09477989375591278, max_abs=1.0, mean_rel=0.038330718874931335, max_rel=145.35643005371094, norm_rel=0.006607742514461279, ref_abs_avg=15.218202590942383, test_abs_avg=15.217863082885742
liger_forward grad[80] vs paper_forward: mean_abs=0.07519960403442383, max_abs=0.25, mean_rel=0.028231794014573097, max_rel=3.05764102935791, norm_rel=0.006615540478378534, ref_abs_avg=12.167888641357422, test_abs_avg=12.16455364227295
liger_forward grad[81] vs paper_forward: mean_abs=0.09070071578025818, max_abs=1.0, mean_rel=0.03967791795730591, max_rel=236.9915313720703, norm_rel=0.0065415604040026665, ref_abs_avg=14.755105972290039, test_abs_avg=14.755067825317383
liger_forward grad[82] vs paper_forward: mean_abs=0.08934562653303146, max_abs=0.8125, mean_rel=0.03762288764119148, max_rel=108.28154754638672, norm_rel=0.006492649670690298, ref_abs_avg=14.709151268005371, test_abs_avg=14.70870304107666
liger_forward grad[83] vs paper_forward: mean_abs=0.07430267333984375, max_abs=0.28125, mean_rel=0.024112295359373093, max_rel=1.9397807121276855, norm_rel=0.006611383054405451, ref_abs_avg=11.660294532775879, test_abs_avg=11.654650688171387
liger_forward grad[84] vs paper_forward: mean_abs=0.08683815598487854, max_abs=0.75, mean_rel=0.036713723093271255, max_rel=105.59767150878906, norm_rel=0.006463751662522554, ref_abs_avg=14.349746704101562, test_abs_avg=14.350004196166992
liger_forward grad[85] vs paper_forward: mean_abs=0.0835501104593277, max_abs=1.0, mean_rel=0.035215843468904495, max_rel=146.29916381835938, norm_rel=0.006272732745856047, ref_abs_avg=14.331979751586914, test_abs_avg=14.332077980041504
liger_forward grad[86] vs paper_forward: mean_abs=0.06414556503295898, max_abs=0.25, mean_rel=0.018711578100919724, max_rel=0.87728351354599, norm_rel=0.005798385012894869, ref_abs_avg=12.2567138671875, test_abs_avg=12.259991645812988
liger_forward grad[87] vs paper_forward: mean_abs=0.0802186131477356, max_abs=1.0, mean_rel=0.037244200706481934, max_rel=266.4895935058594, norm_rel=0.006232124287635088, ref_abs_avg=13.819620132446289, test_abs_avg=13.819629669189453
liger_forward grad[88] vs paper_forward: mean_abs=0.07866379618644714, max_abs=0.75, mean_rel=0.03464557230472565, max_rel=87.25617218017578, norm_rel=0.006249821744859219, ref_abs_avg=13.634992599487305, test_abs_avg=13.634849548339844
liger_forward grad[89] vs paper_forward: mean_abs=0.06479263305664062, max_abs=0.25, mean_rel=0.02078910358250141, max_rel=2.4870450496673584, norm_rel=0.006354608107358217, ref_abs_avg=10.815873146057129, test_abs_avg=10.818320274353027
liger_forward grad[90] vs paper_forward: mean_abs=0.07447481155395508, max_abs=0.9375, mean_rel=0.035509295761585236, max_rel=111.66645812988281, norm_rel=0.006107644643634558, ref_abs_avg=13.171104431152344, test_abs_avg=13.171285629272461
liger_forward grad[91] vs paper_forward: mean_abs=0.07353010773658752, max_abs=0.75, mean_rel=0.034461043775081635, max_rel=138.59945678710938, norm_rel=0.006120308768004179, ref_abs_avg=13.026773452758789, test_abs_avg=13.027016639709473
liger_forward grad[92] vs paper_forward: mean_abs=0.062256455421447754, max_abs=0.25, mean_rel=0.020317845046520233, max_rel=0.8889342546463013, norm_rel=0.006231341511011124, ref_abs_avg=10.704524993896484, test_abs_avg=10.70250129699707
liger_forward grad[93] vs paper_forward: mean_abs=0.07060317695140839, max_abs=0.75, mean_rel=0.03378608822822571, max_rel=132.9857177734375, norm_rel=0.005938166286796331, ref_abs_avg=12.943321228027344, test_abs_avg=12.943267822265625
liger_forward grad[94] vs paper_forward: mean_abs=0.06841111928224564, max_abs=0.75, mean_rel=0.034081488847732544, max_rel=188.3839111328125, norm_rel=0.005987652111798525, ref_abs_avg=12.554539680480957, test_abs_avg=12.55552864074707
liger_forward grad[95] vs paper_forward: mean_abs=0.05915641784667969, max_abs=0.25, mean_rel=0.028996586799621582, max_rel=4.335955619812012, norm_rel=0.006247055716812611, ref_abs_avg=10.338920593261719, test_abs_avg=10.338859558105469
liger_forward grad[96] vs paper_forward: mean_abs=0.06706162542104721, max_abs=1.0, mean_rel=0.035004422068595886, max_rel=329.5043029785156, norm_rel=0.005871027708053589, ref_abs_avg=12.53325080871582, test_abs_avg=12.533053398132324
liger_forward grad[97] vs paper_forward: mean_abs=0.06547345966100693, max_abs=0.625, mean_rel=0.033527594059705734, max_rel=164.1577911376953, norm_rel=0.0058387694880366325, ref_abs_avg=12.405479431152344, test_abs_avg=12.405067443847656
identity layers + randn queries
paper_forward fwd+bwd:  112.816 ms
paper_forward bwd-only: 88.977 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
liger_forward fwd+bwd:  45.274 ms
liger_forward bwd-only: 32.906 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
production_forward fwd+bwd:  33.819 ms
production_forward bwd-only: 28.835 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.240 GiB, fwd+bwd=5.240 GiB
torch_compile_phases_forward fwd+bwd:  48.553 ms
torch_compile_phases_forward bwd-only: 39.385 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0015912111848592758, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.007936087436974049, max_abs=0.265625, mean_rel=0.070296511054039, max_rel=97.3750228881836, norm_rel=0.019274309277534485, ref_abs_avg=0.44614535570144653, test_abs_avg=0.44615769386291504
production_forward grad[1] vs paper_forward: mean_abs=4.903336048126221, max_abs=48.0, mean_rel=0.2369883954524994, max_rel=691.9185791015625, norm_rel=0.019463086500763893, ref_abs_avg=217.13296508789062, test_abs_avg=217.18125915527344
production_forward grad[2] vs paper_forward: mean_abs=0.8247900009155273, max_abs=2.875, mean_rel=0.1210692971944809, max_rel=16.332931518554688, norm_rel=0.02387145347893238, ref_abs_avg=34.60786056518555, test_abs_avg=34.69521713256836
production_forward grad[3] vs paper_forward: mean_abs=0.9835201501846313, max_abs=6.25, mean_rel=0.16166266798973083, max_rel=1401.7481689453125, norm_rel=0.022527756169438362, ref_abs_avg=43.89195251464844, test_abs_avg=43.89385986328125
production_forward grad[4] vs paper_forward: mean_abs=0.9476498961448669, max_abs=6.75, mean_rel=0.1541229486465454, max_rel=812.6004028320312, norm_rel=0.022223755717277527, ref_abs_avg=42.87574768066406, test_abs_avg=42.884212493896484
production_forward grad[5] vs paper_forward: mean_abs=0.6981449127197266, max_abs=2.75, mean_rel=0.06689925491809845, max_rel=3.54374098777771, norm_rel=0.020092405378818512, ref_abs_avg=35.3297119140625, test_abs_avg=35.321311950683594
production_forward grad[6] vs paper_forward: mean_abs=0.8704418540000916, max_abs=6.0, mean_rel=0.16091252863407135, max_rel=1377.2589111328125, norm_rel=0.022313673049211502, ref_abs_avg=39.208553314208984, test_abs_avg=39.21076965332031
production_forward grad[7] vs paper_forward: mean_abs=0.8482118844985962, max_abs=5.375, mean_rel=0.15371322631835938, max_rel=1234.8697509765625, norm_rel=0.022122519090771675, ref_abs_avg=38.53498840332031, test_abs_avg=38.536537170410156
production_forward grad[8] vs paper_forward: mean_abs=0.622952938079834, max_abs=2.75, mean_rel=0.11801041662693024, max_rel=12.539897918701172, norm_rel=0.021654918789863586, ref_abs_avg=29.047752380371094, test_abs_avg=29.094823837280273
production_forward grad[9] vs paper_forward: mean_abs=0.7950041890144348, max_abs=5.25, mean_rel=0.14663879573345184, max_rel=899.8923950195312, norm_rel=0.02226240746676922, ref_abs_avg=35.92656326293945, test_abs_avg=35.92578125
production_forward grad[10] vs paper_forward: mean_abs=0.7759114503860474, max_abs=4.875, mean_rel=0.15285082161426544, max_rel=1759.10791015625, norm_rel=0.022001195698976517, ref_abs_avg=35.457969665527344, test_abs_avg=35.45880126953125
production_forward grad[11] vs paper_forward: mean_abs=0.6084061861038208, max_abs=2.5, mean_rel=0.22349631786346436, max_rel=66.98184967041016, norm_rel=0.024851173162460327, ref_abs_avg=25.02871322631836, test_abs_avg=24.993797302246094
production_forward grad[12] vs paper_forward: mean_abs=0.7374005913734436, max_abs=4.875, mean_rel=0.1465854048728943, max_rel=1969.565185546875, norm_rel=0.021993743255734444, ref_abs_avg=33.68937683105469, test_abs_avg=33.690155029296875
production_forward grad[13] vs paper_forward: mean_abs=0.7190529704093933, max_abs=4.75, mean_rel=0.14253583550453186, max_rel=924.9057006835938, norm_rel=0.021788328886032104, ref_abs_avg=33.183284759521484, test_abs_avg=33.19320297241211
production_forward grad[14] vs paper_forward: mean_abs=0.5680999755859375, max_abs=2.34375, mean_rel=0.07794319093227386, max_rel=6.369848251342773, norm_rel=0.02246011048555374, ref_abs_avg=25.348655700683594, test_abs_avg=25.310401916503906
production_forward grad[15] vs paper_forward: mean_abs=0.6890403032302856, max_abs=4.125, mean_rel=0.15025773644447327, max_rel=1798.892822265625, norm_rel=0.02179555408656597, ref_abs_avg=31.795820236206055, test_abs_avg=31.79711151123047
production_forward grad[16] vs paper_forward: mean_abs=0.6740647554397583, max_abs=3.78125, mean_rel=0.15063145756721497, max_rel=928.8806762695312, norm_rel=0.02167043276131153, ref_abs_avg=31.279022216796875, test_abs_avg=31.283714294433594
production_forward grad[17] vs paper_forward: mean_abs=0.5062718391418457, max_abs=2.375, mean_rel=0.10348870605230331, max_rel=13.05823802947998, norm_rel=0.020261259749531746, ref_abs_avg=26.261348724365234, test_abs_avg=26.27299690246582
production_forward grad[18] vs paper_forward: mean_abs=0.6547384262084961, max_abs=4.0, mean_rel=0.1455640345811844, max_rel=1065.94580078125, norm_rel=0.02179928682744503, ref_abs_avg=30.179981231689453, test_abs_avg=30.181766510009766
production_forward grad[19] vs paper_forward: mean_abs=0.6371356248855591, max_abs=3.8125, mean_rel=0.14968657493591309, max_rel=1010.1336669921875, norm_rel=0.021659763529896736, ref_abs_avg=29.57010269165039, test_abs_avg=29.57476806640625
production_forward grad[20] vs paper_forward: mean_abs=0.515053391456604, max_abs=2.125, mean_rel=0.12205755710601807, max_rel=23.886240005493164, norm_rel=0.022541915997862816, ref_abs_avg=23.353086471557617, test_abs_avg=23.3747615814209
production_forward grad[21] vs paper_forward: mean_abs=0.6195188760757446, max_abs=3.515625, mean_rel=0.1346554160118103, max_rel=1063.8851318359375, norm_rel=0.021638767793774605, ref_abs_avg=28.789886474609375, test_abs_avg=28.791736602783203
production_forward grad[22] vs paper_forward: mean_abs=0.6069578528404236, max_abs=3.875, mean_rel=0.13981567323207855, max_rel=640.7174072265625, norm_rel=0.021390365436673164, ref_abs_avg=28.49123764038086, test_abs_avg=28.490718841552734
production_forward grad[23] vs paper_forward: mean_abs=0.4721760153770447, max_abs=1.625, mean_rel=0.13509723544120789, max_rel=31.289684295654297, norm_rel=0.02006755769252777, ref_abs_avg=23.471450805664062, test_abs_avg=23.45985984802246
production_forward grad[24] vs paper_forward: mean_abs=0.5904543995857239, max_abs=3.607421875, mean_rel=0.14132560789585114, max_rel=1122.8411865234375, norm_rel=0.021416958421468735, ref_abs_avg=27.709300994873047, test_abs_avg=27.709705352783203
production_forward grad[25] vs paper_forward: mean_abs=0.5762278437614441, max_abs=3.625, mean_rel=0.14886429905891418, max_rel=1174.599853515625, norm_rel=0.021192800253629684, ref_abs_avg=27.332847595214844, test_abs_avg=27.33310317993164
production_forward grad[26] vs paper_forward: mean_abs=0.6074117422103882, max_abs=2.25, mean_rel=0.23412927985191345, max_rel=41.21668243408203, norm_rel=0.024273646995425224, ref_abs_avg=24.69881248474121, test_abs_avg=24.731136322021484
production_forward grad[27] vs paper_forward: mean_abs=0.6754688620567322, max_abs=4.0625, mean_rel=0.16000555455684662, max_rel=1077.8013916015625, norm_rel=0.023206956684589386, ref_abs_avg=29.237552642822266, test_abs_avg=29.24129295349121
production_forward grad[28] vs paper_forward: mean_abs=0.6586720943450928, max_abs=4.0, mean_rel=0.14330251514911652, max_rel=833.8500366210938, norm_rel=0.02297954075038433, ref_abs_avg=28.820735931396484, test_abs_avg=28.825759887695312
production_forward grad[29] vs paper_forward: mean_abs=0.5207500457763672, max_abs=2.01953125, mean_rel=0.05936143547296524, max_rel=3.510892629623413, norm_rel=0.022733401507139206, ref_abs_avg=23.081195831298828, test_abs_avg=23.038469314575195
production_forward grad[30] vs paper_forward: mean_abs=0.6303110122680664, max_abs=4.5625, mean_rel=0.14953497052192688, max_rel=738.2168579101562, norm_rel=0.023423844948410988, ref_abs_avg=26.97092056274414, test_abs_avg=26.97250747680664
production_forward grad[31] vs paper_forward: mean_abs=0.6128004789352417, max_abs=5.0625, mean_rel=0.14054688811302185, max_rel=1119.1324462890625, norm_rel=0.023350201547145844, ref_abs_avg=26.347606658935547, test_abs_avg=26.34915542602539
production_forward grad[32] vs paper_forward: mean_abs=0.4852733612060547, max_abs=2.0625, mean_rel=0.08822304010391235, max_rel=9.669404029846191, norm_rel=0.024053407832980156, ref_abs_avg=20.52486228942871, test_abs_avg=20.486988067626953
production_forward grad[33] vs paper_forward: mean_abs=0.5963504910469055, max_abs=4.0, mean_rel=0.1580798178911209, max_rel=1546.6002197265625, norm_rel=0.023574454709887505, ref_abs_avg=25.366289138793945, test_abs_avg=25.368831634521484
production_forward grad[34] vs paper_forward: mean_abs=0.5834683179855347, max_abs=3.5, mean_rel=0.1565459817647934, max_rel=956.4345703125, norm_rel=0.023555031046271324, ref_abs_avg=24.845611572265625, test_abs_avg=24.851287841796875
production_forward grad[35] vs paper_forward: mean_abs=0.4767496585845947, max_abs=1.6875, mean_rel=0.1165945902466774, max_rel=16.167070388793945, norm_rel=0.022944247350096703, ref_abs_avg=20.53549575805664, test_abs_avg=20.556869506835938
production_forward grad[36] vs paper_forward: mean_abs=0.5540838241577148, max_abs=3.5, mean_rel=0.1515270620584488, max_rel=785.5155639648438, norm_rel=0.023366475477814674, ref_abs_avg=23.78932762145996, test_abs_avg=23.791717529296875
production_forward grad[37] vs paper_forward: mean_abs=0.5444830656051636, max_abs=3.375, mean_rel=0.15263180434703827, max_rel=850.690185546875, norm_rel=0.023404575884342194, ref_abs_avg=23.34163475036621, test_abs_avg=23.345481872558594
production_forward grad[38] vs paper_forward: mean_abs=0.4500141143798828, max_abs=1.78125, mean_rel=0.12606894969940186, max_rel=11.94687557220459, norm_rel=0.023192914202809334, ref_abs_avg=18.93120574951172, test_abs_avg=18.90944480895996
production_forward grad[39] vs paper_forward: mean_abs=0.5214273929595947, max_abs=3.25, mean_rel=0.14623087644577026, max_rel=936.9710083007812, norm_rel=0.023033395409584045, ref_abs_avg=22.700870513916016, test_abs_avg=22.702442169189453
production_forward grad[40] vs paper_forward: mean_abs=0.5096338987350464, max_abs=3.330078125, mean_rel=0.1460099220275879, max_rel=504.6988830566406, norm_rel=0.02275768294930458, ref_abs_avg=22.46453857421875, test_abs_avg=22.46784782409668
production_forward grad[41] vs paper_forward: mean_abs=0.4138292074203491, max_abs=1.375, mean_rel=0.3522956669330597, max_rel=124.45390319824219, norm_rel=0.023352758958935738, ref_abs_avg=17.604896545410156, test_abs_avg=17.594383239746094
production_forward grad[42] vs paper_forward: mean_abs=0.49460849165916443, max_abs=3.203125, mean_rel=0.14702221751213074, max_rel=1350.3387451171875, norm_rel=0.022862909361720085, ref_abs_avg=21.683080673217773, test_abs_avg=21.68187713623047
production_forward grad[43] vs paper_forward: mean_abs=0.4855227470397949, max_abs=3.25, mean_rel=0.14318493008613586, max_rel=1075.6884765625, norm_rel=0.022566698491573334, ref_abs_avg=21.58945655822754, test_abs_avg=21.593944549560547
production_forward grad[44] vs paper_forward: mean_abs=0.38678407669067383, max_abs=1.5, mean_rel=0.08722999691963196, max_rel=6.488088130950928, norm_rel=0.022750642150640488, ref_abs_avg=16.991310119628906, test_abs_avg=17.00955581665039
production_forward grad[45] vs paper_forward: mean_abs=0.4711030125617981, max_abs=3.03955078125, mean_rel=0.1476588249206543, max_rel=633.086181640625, norm_rel=0.022604303434491158, ref_abs_avg=20.876461029052734, test_abs_avg=20.876432418823242
production_forward grad[46] vs paper_forward: mean_abs=0.4614696502685547, max_abs=3.0625, mean_rel=0.1464708149433136, max_rel=1130.671142578125, norm_rel=0.022476959973573685, ref_abs_avg=20.56955337524414, test_abs_avg=20.56849479675293
production_forward grad[47] vs paper_forward: mean_abs=0.3484840393066406, max_abs=1.625, mean_rel=0.06474247574806213, max_rel=3.4875998497009277, norm_rel=0.02098957449197769, ref_abs_avg=17.271747589111328, test_abs_avg=17.25579071044922
production_forward grad[48] vs paper_forward: mean_abs=0.4491376280784607, max_abs=3.0, mean_rel=0.14960530400276184, max_rel=631.6649169921875, norm_rel=0.02229253016412258, ref_abs_avg=20.1634578704834, test_abs_avg=20.16414451599121
production_forward grad[49] vs paper_forward: mean_abs=0.4428752064704895, max_abs=2.78125, mean_rel=0.14682045578956604, max_rel=936.649658203125, norm_rel=0.022173944860696793, ref_abs_avg=20.000934600830078, test_abs_avg=20.005443572998047
production_forward grad[50] vs paper_forward: mean_abs=0.4093773365020752, max_abs=1.625, mean_rel=0.124882772564888, max_rel=12.301403045654297, norm_rel=0.024242153391242027, ref_abs_avg=17.372665405273438, test_abs_avg=17.365013122558594
production_forward grad[51] vs paper_forward: mean_abs=0.508151650428772, max_abs=3.25, mean_rel=0.15595810115337372, max_rel=982.3922119140625, norm_rel=0.0240899957716465, ref_abs_avg=21.167583465576172, test_abs_avg=21.169757843017578
production_forward grad[52] vs paper_forward: mean_abs=0.49613773822784424, max_abs=3.75, mean_rel=0.16126713156700134, max_rel=971.9232177734375, norm_rel=0.023687204346060753, ref_abs_avg=21.039630889892578, test_abs_avg=21.035682678222656
production_forward grad[53] vs paper_forward: mean_abs=0.3819170594215393, max_abs=1.625, mean_rel=0.09446489810943604, max_rel=9.293723106384277, norm_rel=0.023977885022759438, ref_abs_avg=15.749303817749023, test_abs_avg=15.775460243225098
production_forward grad[54] vs paper_forward: mean_abs=0.4701705276966095, max_abs=3.125, mean_rel=0.1534823477268219, max_rel=730.8773803710938, norm_rel=0.02353391796350479, ref_abs_avg=19.983936309814453, test_abs_avg=19.98444366455078
production_forward grad[55] vs paper_forward: mean_abs=0.46038615703582764, max_abs=3.3125, mean_rel=0.14904285967350006, max_rel=770.4989013671875, norm_rel=0.023338589817285538, ref_abs_avg=19.75372886657715, test_abs_avg=19.757410049438477
production_forward grad[56] vs paper_forward: mean_abs=0.3604978322982788, max_abs=1.5, mean_rel=0.1956511288881302, max_rel=58.25556945800781, norm_rel=0.023957176133990288, ref_abs_avg=15.12527084350586, test_abs_avg=15.123680114746094
production_forward grad[57] vs paper_forward: mean_abs=0.43454647064208984, max_abs=3.5, mean_rel=0.15621241927146912, max_rel=905.7435302734375, norm_rel=0.023229943588376045, ref_abs_avg=18.735370635986328, test_abs_avg=18.73577880859375
production_forward grad[58] vs paper_forward: mean_abs=0.4289565086364746, max_abs=2.75, mean_rel=0.14808382093906403, max_rel=711.7032470703125, norm_rel=0.023162195459008217, ref_abs_avg=18.54763412475586, test_abs_avg=18.541818618774414
production_forward grad[59] vs paper_forward: mean_abs=0.3630564212799072, max_abs=1.625, mean_rel=0.2009519785642624, max_rel=63.85763168334961, norm_rel=0.02404666878283024, ref_abs_avg=15.525550842285156, test_abs_avg=15.501886367797852
production_forward grad[60] vs paper_forward: mean_abs=0.4074943959712982, max_abs=3.0, mean_rel=0.14658495783805847, max_rel=782.806884765625, norm_rel=0.0229000486433506, ref_abs_avg=17.805152893066406, test_abs_avg=17.805389404296875
production_forward grad[61] vs paper_forward: mean_abs=0.4005877375602722, max_abs=3.2578125, mean_rel=0.1415397673845291, max_rel=698.7963256835938, norm_rel=0.02240418642759323, ref_abs_avg=17.914180755615234, test_abs_avg=17.912952423095703
production_forward grad[62] vs paper_forward: mean_abs=0.3327345848083496, max_abs=1.125, mean_rel=0.12456274777650833, max_rel=22.829017639160156, norm_rel=0.023612599819898605, ref_abs_avg=14.265851974487305, test_abs_avg=14.242568016052246
production_forward grad[63] vs paper_forward: mean_abs=0.38429245352745056, max_abs=2.5, mean_rel=0.1433309018611908, max_rel=934.0419311523438, norm_rel=0.022195644676685333, ref_abs_avg=17.302536010742188, test_abs_avg=17.301801681518555
production_forward grad[64] vs paper_forward: mean_abs=0.37825632095336914, max_abs=2.53125, mean_rel=0.14480635523796082, max_rel=487.42633056640625, norm_rel=0.0218936949968338, ref_abs_avg=17.250995635986328, test_abs_avg=17.246328353881836
production_forward grad[65] vs paper_forward: mean_abs=0.3009665012359619, max_abs=1.375, mean_rel=0.4267945885658264, max_rel=165.4528045654297, norm_rel=0.021619804203510284, ref_abs_avg=13.748781204223633, test_abs_avg=13.752538681030273
production_forward grad[66] vs paper_forward: mean_abs=0.36795586347579956, max_abs=2.5, mean_rel=0.13892173767089844, max_rel=545.4388427734375, norm_rel=0.02199239656329155, ref_abs_avg=16.734508514404297, test_abs_avg=16.734813690185547
production_forward grad[67] vs paper_forward: mean_abs=0.360576331615448, max_abs=3.0, mean_rel=0.139458566904068, max_rel=648.3123168945312, norm_rel=0.021504733711481094, ref_abs_avg=16.745676040649414, test_abs_avg=16.74709129333496
production_forward grad[68] vs paper_forward: mean_abs=0.2711678743362427, max_abs=1.125, mean_rel=0.06255950778722763, max_rel=3.5580718517303467, norm_rel=0.01938679814338684, ref_abs_avg=14.521284103393555, test_abs_avg=14.51998519897461
production_forward grad[69] vs paper_forward: mean_abs=0.34803494811058044, max_abs=2.5, mean_rel=0.1349494457244873, max_rel=827.7999877929688, norm_rel=0.02148609608411789, ref_abs_avg=16.159770965576172, test_abs_avg=16.15912628173828
production_forward grad[70] vs paper_forward: mean_abs=0.34202826023101807, max_abs=2.625, mean_rel=0.13893896341323853, max_rel=369.3607177734375, norm_rel=0.02177419513463974, ref_abs_avg=15.725704193115234, test_abs_avg=15.72855281829834
production_forward grad[71] vs paper_forward: mean_abs=0.2658175230026245, max_abs=1.0, mean_rel=0.12420549243688583, max_rel=9.416170120239258, norm_rel=0.02016778476536274, ref_abs_avg=13.35358715057373, test_abs_avg=13.364531517028809
production_forward grad[72] vs paper_forward: mean_abs=0.32838189601898193, max_abs=2.5, mean_rel=0.1333709955215454, max_rel=722.807861328125, norm_rel=0.02121066488325596, ref_abs_avg=15.475139617919922, test_abs_avg=15.474652290344238
production_forward grad[73] vs paper_forward: mean_abs=0.3260168731212616, max_abs=2.5, mean_rel=0.13586924970149994, max_rel=709.5079345703125, norm_rel=0.021085649728775024, ref_abs_avg=15.467032432556152, test_abs_avg=15.468060493469238
production_forward grad[74] vs paper_forward: mean_abs=0.31502723693847656, max_abs=1.1875, mean_rel=0.07642880082130432, max_rel=2.4192583560943604, norm_rel=0.02246757037937641, ref_abs_avg=13.402661323547363, test_abs_avg=13.43685245513916
production_forward grad[75] vs paper_forward: mean_abs=0.37032604217529297, max_abs=2.53125, mean_rel=0.1477668583393097, max_rel=595.4608154296875, norm_rel=0.022945133969187737, ref_abs_avg=16.168275833129883, test_abs_avg=16.169633865356445
production_forward grad[76] vs paper_forward: mean_abs=0.3627135157585144, max_abs=2.5, mean_rel=0.14586731791496277, max_rel=555.7822875976562, norm_rel=0.022802244871854782, ref_abs_avg=15.932918548583984, test_abs_avg=15.932069778442383
production_forward grad[77] vs paper_forward: mean_abs=0.28574156761169434, max_abs=1.375, mean_rel=0.07997793704271317, max_rel=13.393628120422363, norm_rel=0.022620392963290215, ref_abs_avg=13.234760284423828, test_abs_avg=13.209275245666504
production_forward grad[78] vs paper_forward: mean_abs=0.34735941886901855, max_abs=2.75, mean_rel=0.14315210282802582, max_rel=709.3226318359375, norm_rel=0.022518280893564224, ref_abs_avg=15.43704605102539, test_abs_avg=15.43803596496582
production_forward grad[79] vs paper_forward: mean_abs=0.3382743000984192, max_abs=2.75, mean_rel=0.14308759570121765, max_rel=687.7850952148438, norm_rel=0.022279461845755577, ref_abs_avg=15.251541137695312, test_abs_avg=15.249334335327148
production_forward grad[80] vs paper_forward: mean_abs=0.2771111726760864, max_abs=1.375, mean_rel=0.08772184699773788, max_rel=5.211363315582275, norm_rel=0.022865410894155502, ref_abs_avg=12.196866989135742, test_abs_avg=12.191183090209961
production_forward grad[81] vs paper_forward: mean_abs=0.32136738300323486, max_abs=2.625, mean_rel=0.13417181372642517, max_rel=590.417236328125, norm_rel=0.021839849650859833, ref_abs_avg=14.734294891357422, test_abs_avg=14.73484992980957
production_forward grad[82] vs paper_forward: mean_abs=0.31454771757125854, max_abs=2.75, mean_rel=0.12986059486865997, max_rel=641.9486083984375, norm_rel=0.021812524646520615, ref_abs_avg=14.464713096618652, test_abs_avg=14.459375381469727
production_forward grad[83] vs paper_forward: mean_abs=0.2476634979248047, max_abs=1.0, mean_rel=0.09113024175167084, max_rel=15.026657104492188, norm_rel=0.021422404795885086, ref_abs_avg=11.964437484741211, test_abs_avg=11.991750717163086
production_forward grad[84] vs paper_forward: mean_abs=0.2963438332080841, max_abs=2.625, mean_rel=0.130741685628891, max_rel=475.3520202636719, norm_rel=0.021276302635669708, ref_abs_avg=13.947547912597656, test_abs_avg=13.947782516479492
production_forward grad[85] vs paper_forward: mean_abs=0.2849269509315491, max_abs=2.625, mean_rel=0.1288711279630661, max_rel=443.2431945800781, norm_rel=0.02039903774857521, ref_abs_avg=13.978538513183594, test_abs_avg=13.967545509338379
production_forward grad[86] vs paper_forward: mean_abs=0.24621081352233887, max_abs=1.0, mean_rel=0.10132437199354172, max_rel=12.745699882507324, norm_rel=0.021648678928613663, ref_abs_avg=11.549566268920898, test_abs_avg=11.563870429992676
production_forward grad[87] vs paper_forward: mean_abs=0.2828001379966736, max_abs=2.75, mean_rel=0.12546826899051666, max_rel=851.3065795898438, norm_rel=0.020790409296751022, ref_abs_avg=13.658564567565918, test_abs_avg=13.65825080871582
production_forward grad[88] vs paper_forward: mean_abs=0.26986515522003174, max_abs=2.0625, mean_rel=0.12004637718200684, max_rel=564.4966430664062, norm_rel=0.0202300064265728, ref_abs_avg=13.330177307128906, test_abs_avg=13.32546615600586
production_forward grad[89] vs paper_forward: mean_abs=0.2064582109451294, max_abs=0.875, mean_rel=0.06315760314464569, max_rel=7.944154262542725, norm_rel=0.019084149971604347, ref_abs_avg=11.015790939331055, test_abs_avg=11.02751350402832
production_forward grad[90] vs paper_forward: mean_abs=0.2580646276473999, max_abs=3.75, mean_rel=0.12435144186019897, max_rel=838.7919921875, norm_rel=0.020136917009949684, ref_abs_avg=12.911964416503906, test_abs_avg=12.911076545715332
production_forward grad[91] vs paper_forward: mean_abs=0.2524620294570923, max_abs=2.25, mean_rel=0.11075889319181442, max_rel=284.7710876464844, norm_rel=0.01953544095158577, ref_abs_avg=12.991560935974121, test_abs_avg=12.996931076049805
production_forward grad[92] vs paper_forward: mean_abs=0.21195775270462036, max_abs=0.75, mean_rel=0.2842165529727936, max_rel=72.80178833007812, norm_rel=0.019713345915079117, ref_abs_avg=10.871389389038086, test_abs_avg=10.880609512329102
production_forward grad[93] vs paper_forward: mean_abs=0.24160747230052948, max_abs=3.125, mean_rel=0.11723346263170242, max_rel=588.0259399414062, norm_rel=0.019441423937678337, ref_abs_avg=12.559805870056152, test_abs_avg=12.560543060302734
production_forward grad[94] vs paper_forward: mean_abs=0.23467597365379333, max_abs=2.125, mean_rel=0.11765313148498535, max_rel=458.71856689453125, norm_rel=0.01896015927195549, ref_abs_avg=12.483104705810547, test_abs_avg=12.487512588500977
production_forward grad[95] vs paper_forward: mean_abs=0.19258975982666016, max_abs=0.90625, mean_rel=0.08369022607803345, max_rel=7.063965797424316, norm_rel=0.019357167184352875, ref_abs_avg=10.407535552978516, test_abs_avg=10.400495529174805
production_forward grad[96] vs paper_forward: mean_abs=0.23076242208480835, max_abs=2.25, mean_rel=0.12179723381996155, max_rel=533.5557250976562, norm_rel=0.01922941952943802, ref_abs_avg=12.166218757629395, test_abs_avg=12.166206359863281
production_forward grad[97] vs paper_forward: mean_abs=0.22904229164123535, max_abs=2.015625, mean_rel=0.11595648527145386, max_rel=574.4570922851562, norm_rel=0.019068872556090355, ref_abs_avg=12.164737701416016, test_abs_avg=12.159231185913086
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0015942829195410013, max_abs=0.0625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008271429687738419, max_abs=0.3125, mean_rel=0.07288069278001785, max_rel=109.37267303466797, norm_rel=0.019982339814305305, ref_abs_avg=0.44614535570144653, test_abs_avg=0.44614148139953613
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.044621467590332, max_abs=42.0, mean_rel=0.21514572203159332, max_rel=655.1707153320312, norm_rel=0.01998240128159523, ref_abs_avg=217.13296508789062, test_abs_avg=217.16635131835938
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.8194961547851562, max_abs=3.125, mean_rel=0.13120056688785553, max_rel=15.894647598266602, norm_rel=0.024177202954888344, ref_abs_avg=34.60786056518555, test_abs_avg=34.65460205078125
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.0197257995605469, max_abs=7.0, mean_rel=0.17064781486988068, max_rel=1866.2799072265625, norm_rel=0.023351192474365234, ref_abs_avg=43.89195251464844, test_abs_avg=43.89230728149414
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=0.9819649457931519, max_abs=6.0, mean_rel=0.163116917014122, max_rel=812.8980712890625, norm_rel=0.0230193380266428, ref_abs_avg=42.87574768066406, test_abs_avg=42.879737854003906
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7157301902770996, max_abs=2.5, mean_rel=0.07153420150279999, max_rel=5.038172245025635, norm_rel=0.020407991483807564, ref_abs_avg=35.3297119140625, test_abs_avg=35.322975158691406
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.90022873878479, max_abs=6.0, mean_rel=0.1715860366821289, max_rel=1750.1151123046875, norm_rel=0.02307235635817051, ref_abs_avg=39.208553314208984, test_abs_avg=39.20765686035156
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.8758397698402405, max_abs=5.875, mean_rel=0.15382400155067444, max_rel=936.5254516601562, norm_rel=0.022835249081254005, ref_abs_avg=38.53498840332031, test_abs_avg=38.537681579589844
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.6761374473571777, max_abs=2.625, mean_rel=0.09369726479053497, max_rel=6.5423688888549805, norm_rel=0.022899175062775612, ref_abs_avg=29.047752380371094, test_abs_avg=29.079322814941406
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8231070041656494, max_abs=5.0, mean_rel=0.1541786938905716, max_rel=1131.17431640625, norm_rel=0.023029975593090057, ref_abs_avg=35.92656326293945, test_abs_avg=35.925086975097656
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8043214082717896, max_abs=5.0, mean_rel=0.16096138954162598, max_rel=2334.71826171875, norm_rel=0.02278279885649681, ref_abs_avg=35.457969665527344, test_abs_avg=35.456932067871094
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6557794809341431, max_abs=2.28125, mean_rel=0.12487231940031052, max_rel=9.645347595214844, norm_rel=0.025897812098264694, ref_abs_avg=25.02871322631836, test_abs_avg=24.969432830810547
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.7622649669647217, max_abs=4.65625, mean_rel=0.1500615030527115, max_rel=2299.6015625, norm_rel=0.022732848301529884, ref_abs_avg=33.68937683105469, test_abs_avg=33.68690490722656
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7449741363525391, max_abs=4.5, mean_rel=0.14754775166511536, max_rel=1126.065185546875, norm_rel=0.02254682220518589, ref_abs_avg=33.183284759521484, test_abs_avg=33.192230224609375
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6074919700622559, max_abs=2.5625, mean_rel=0.0730556845664978, max_rel=3.800720453262329, norm_rel=0.023834336549043655, ref_abs_avg=25.348655700683594, test_abs_avg=25.29048728942871
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7102035880088806, max_abs=4.375, mean_rel=0.1533624529838562, max_rel=1250.7071533203125, norm_rel=0.022459501400589943, ref_abs_avg=31.795820236206055, test_abs_avg=31.796558380126953
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.694269597530365, max_abs=4.25, mean_rel=0.15502886474132538, max_rel=1051.8040771484375, norm_rel=0.022335903719067574, ref_abs_avg=31.279022216796875, test_abs_avg=31.282499313354492
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.537050724029541, max_abs=2.5, mean_rel=0.12990930676460266, max_rel=11.930731773376465, norm_rel=0.021136023104190826, ref_abs_avg=26.261348724365234, test_abs_avg=26.27349281311035
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.6738162040710449, max_abs=4.75, mean_rel=0.14619050920009613, max_rel=1223.2208251953125, norm_rel=0.022433998063206673, ref_abs_avg=30.179981231689453, test_abs_avg=30.18132972717285
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6563330888748169, max_abs=4.0, mean_rel=0.1485874354839325, max_rel=702.5281372070312, norm_rel=0.022294940426945686, ref_abs_avg=29.57010269165039, test_abs_avg=29.573715209960938
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5283883810043335, max_abs=1.875, mean_rel=0.11958503723144531, max_rel=23.346120834350586, norm_rel=0.022643497213721275, ref_abs_avg=23.353086471557617, test_abs_avg=23.376955032348633
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6362584233283997, max_abs=3.75, mean_rel=0.1397191435098648, max_rel=1185.6939697265625, norm_rel=0.022219719365239143, ref_abs_avg=28.789886474609375, test_abs_avg=28.79195785522461
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6252763271331787, max_abs=3.75, mean_rel=0.14766168594360352, max_rel=914.4620361328125, norm_rel=0.022019321098923683, ref_abs_avg=28.49123764038086, test_abs_avg=28.491052627563477
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.472719669342041, max_abs=1.75, mean_rel=0.11966145038604736, max_rel=22.475162506103516, norm_rel=0.02013971284031868, ref_abs_avg=23.471450805664062, test_abs_avg=23.452157974243164
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.606020450592041, max_abs=3.75, mean_rel=0.14476332068443298, max_rel=1443.4686279296875, norm_rel=0.02197244204580784, ref_abs_avg=27.709300994873047, test_abs_avg=27.709278106689453
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.591081976890564, max_abs=3.5, mean_rel=0.16389262676239014, max_rel=1566.1680908203125, norm_rel=0.021735791116952896, ref_abs_avg=27.332847595214844, test_abs_avg=27.33232879638672
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6061005592346191, max_abs=2.25, mean_rel=0.22283893823623657, max_rel=32.92258834838867, norm_rel=0.024319138377904892, ref_abs_avg=24.69881248474121, test_abs_avg=24.733617782592773
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.6953321695327759, max_abs=4.125, mean_rel=0.16198158264160156, max_rel=1057.09130859375, norm_rel=0.023861324414610863, ref_abs_avg=29.237552642822266, test_abs_avg=29.24079132080078
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.6775085926055908, max_abs=4.25, mean_rel=0.14821025729179382, max_rel=810.4452514648438, norm_rel=0.023638855665922165, ref_abs_avg=28.820735931396484, test_abs_avg=28.824481964111328
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.54400634765625, max_abs=2.5, mean_rel=0.06078605353832245, max_rel=2.6959853172302246, norm_rel=0.023904353380203247, ref_abs_avg=23.081195831298828, test_abs_avg=23.063854217529297
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.646582841873169, max_abs=4.125, mean_rel=0.153868168592453, max_rel=790.3585205078125, norm_rel=0.024028094485402107, ref_abs_avg=26.97092056274414, test_abs_avg=26.972549438476562
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6290814280509949, max_abs=4.6875, mean_rel=0.1462276428937912, max_rel=1524.1025390625, norm_rel=0.023955460637807846, ref_abs_avg=26.347606658935547, test_abs_avg=26.349166870117188
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5078964233398438, max_abs=2.0, mean_rel=0.09325955808162689, max_rel=9.18890380859375, norm_rel=0.02532900497317314, ref_abs_avg=20.52486228942871, test_abs_avg=20.49944496154785
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6100882291793823, max_abs=4.0, mean_rel=0.1618165671825409, max_rel=1265.4456787109375, norm_rel=0.024108517915010452, ref_abs_avg=25.366289138793945, test_abs_avg=25.3681640625
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.5953510403633118, max_abs=3.625, mean_rel=0.15811756253242493, max_rel=1413.0347900390625, norm_rel=0.024031592532992363, ref_abs_avg=24.845611572265625, test_abs_avg=24.85048484802246
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.4698776602745056, max_abs=1.8125, mean_rel=0.12246868759393692, max_rel=21.012680053710938, norm_rel=0.023291321471333504, ref_abs_avg=20.53549575805664, test_abs_avg=20.564006805419922
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.5664544701576233, max_abs=3.5, mean_rel=0.15585777163505554, max_rel=830.5657348632812, norm_rel=0.023879148066043854, ref_abs_avg=23.78932762145996, test_abs_avg=23.79105567932129
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5570082664489746, max_abs=3.5, mean_rel=0.15511846542358398, max_rel=655.3336791992188, norm_rel=0.023923959583044052, ref_abs_avg=23.34163475036621, test_abs_avg=23.343477249145508
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4522298574447632, max_abs=2.0, mean_rel=0.1060294359922409, max_rel=6.0440826416015625, norm_rel=0.02342240698635578, ref_abs_avg=18.93120574951172, test_abs_avg=18.917736053466797
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5321840643882751, max_abs=3.5, mean_rel=0.14811569452285767, max_rel=1109.7733154296875, norm_rel=0.02349473536014557, ref_abs_avg=22.700870513916016, test_abs_avg=22.701515197753906
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5189242362976074, max_abs=3.28125, mean_rel=0.1508009433746338, max_rel=438.0979309082031, norm_rel=0.02314809150993824, ref_abs_avg=22.46453857421875, test_abs_avg=22.467914581298828
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.407396137714386, max_abs=1.625, mean_rel=0.3972439169883728, max_rel=150.99005126953125, norm_rel=0.02340855821967125, ref_abs_avg=17.604896545410156, test_abs_avg=17.606353759765625
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.504304051399231, max_abs=3.0, mean_rel=0.1519457995891571, max_rel=1225.778076171875, norm_rel=0.02330864779651165, ref_abs_avg=21.683080673217773, test_abs_avg=21.681941986083984
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.49486759305000305, max_abs=3.0, mean_rel=0.13896310329437256, max_rel=1052.4722900390625, norm_rel=0.023001806810498238, ref_abs_avg=21.58945655822754, test_abs_avg=21.591373443603516
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.39293575286865234, max_abs=1.5, mean_rel=0.09876187145709991, max_rel=6.516860485076904, norm_rel=0.023258483037352562, ref_abs_avg=16.991310119628906, test_abs_avg=16.997543334960938
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.4788031578063965, max_abs=3.5, mean_rel=0.15110069513320923, max_rel=845.9212036132812, norm_rel=0.022979706525802612, ref_abs_avg=20.876461029052734, test_abs_avg=20.876588821411133
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.4697062075138092, max_abs=3.0, mean_rel=0.14802983403205872, max_rel=1100.1236572265625, norm_rel=0.022899622097611427, ref_abs_avg=20.56955337524414, test_abs_avg=20.56817626953125
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.36499786376953125, max_abs=1.40625, mean_rel=0.06938749551773071, max_rel=2.7905890941619873, norm_rel=0.021484099328517914, ref_abs_avg=17.271747589111328, test_abs_avg=17.269142150878906
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.45642393827438354, max_abs=3.25, mean_rel=0.1513064205646515, max_rel=677.87060546875, norm_rel=0.022644784301519394, ref_abs_avg=20.1634578704834, test_abs_avg=20.163665771484375
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.45047587156295776, max_abs=3.0, mean_rel=0.1447611153125763, max_rel=543.5665893554688, norm_rel=0.02252248115837574, ref_abs_avg=20.000934600830078, test_abs_avg=20.004287719726562
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4143052101135254, max_abs=2.0, mean_rel=0.12190823256969452, max_rel=9.913264274597168, norm_rel=0.024135077372193336, ref_abs_avg=17.372665405273438, test_abs_avg=17.37584686279297
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5169955492019653, max_abs=4.0, mean_rel=0.15886840224266052, max_rel=935.6275634765625, norm_rel=0.024495471268892288, ref_abs_avg=21.167583465576172, test_abs_avg=21.170181274414062
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5044755935668945, max_abs=3.75, mean_rel=0.16531100869178772, max_rel=874.7304077148438, norm_rel=0.02406064234673977, ref_abs_avg=21.039630889892578, test_abs_avg=21.035701751708984
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.4006829261779785, max_abs=1.625, mean_rel=0.11704792082309723, max_rel=17.151456832885742, norm_rel=0.024883469566702843, ref_abs_avg=15.749303817749023, test_abs_avg=15.758994102478027
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.4773494601249695, max_abs=3.25, mean_rel=0.15533152222633362, max_rel=1294.8912353515625, norm_rel=0.02387869358062744, ref_abs_avg=19.983936309814453, test_abs_avg=19.984228134155273
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.4671669006347656, max_abs=3.75, mean_rel=0.15239983797073364, max_rel=705.4781494140625, norm_rel=0.023678075522184372, ref_abs_avg=19.75372886657715, test_abs_avg=19.75562286376953
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.36722278594970703, max_abs=1.5, mean_rel=0.3962283432483673, max_rel=145.94595336914062, norm_rel=0.024002529680728912, ref_abs_avg=15.12527084350586, test_abs_avg=15.116647720336914
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.4399629235267639, max_abs=3.375, mean_rel=0.15822915732860565, max_rel=904.2421875, norm_rel=0.02352873980998993, ref_abs_avg=18.735370635986328, test_abs_avg=18.735349655151367
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4350185990333557, max_abs=2.75, mean_rel=0.15335716307163239, max_rel=949.0010375976562, norm_rel=0.023510878905653954, ref_abs_avg=18.54763412475586, test_abs_avg=18.540904998779297
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.36159729957580566, max_abs=1.5625, mean_rel=0.41965723037719727, max_rel=175.43756103515625, norm_rel=0.02391180954873562, ref_abs_avg=15.525550842285156, test_abs_avg=15.509973526000977
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.4131157696247101, max_abs=2.875, mean_rel=0.14812475442886353, max_rel=904.0411987304688, norm_rel=0.02322796918451786, ref_abs_avg=17.805152893066406, test_abs_avg=17.805200576782227
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.40807437896728516, max_abs=2.75, mean_rel=0.14512309432029724, max_rel=896.9793090820312, norm_rel=0.022822171449661255, ref_abs_avg=17.914180755615234, test_abs_avg=17.912551879882812
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3385024070739746, max_abs=1.25, mean_rel=0.10426659882068634, max_rel=11.276114463806152, norm_rel=0.023862706497311592, ref_abs_avg=14.265851974487305, test_abs_avg=14.257218360900879
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.3895832896232605, max_abs=2.625, mean_rel=0.1441894769668579, max_rel=735.4903564453125, norm_rel=0.022486958652734756, ref_abs_avg=17.302536010742188, test_abs_avg=17.301677703857422
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.3827182650566101, max_abs=2.640625, mean_rel=0.1465478241443634, max_rel=532.5355834960938, norm_rel=0.022140463814139366, ref_abs_avg=17.250995635986328, test_abs_avg=17.244136810302734
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.31019163131713867, max_abs=1.40625, mean_rel=0.434267520904541, max_rel=131.4352264404297, norm_rel=0.022352855652570724, ref_abs_avg=13.748781204223633, test_abs_avg=13.768707275390625
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.37207192182540894, max_abs=2.625, mean_rel=0.14073628187179565, max_rel=523.7177124023438, norm_rel=0.022224174812436104, ref_abs_avg=16.734508514404297, test_abs_avg=16.734786987304688
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3648175597190857, max_abs=2.5, mean_rel=0.14036215841770172, max_rel=609.809326171875, norm_rel=0.02174662984907627, ref_abs_avg=16.745676040649414, test_abs_avg=16.74539566040039
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.2856006622314453, max_abs=1.125, mean_rel=0.07461261749267578, max_rel=4.060184001922607, norm_rel=0.020516565069556236, ref_abs_avg=14.521284103393555, test_abs_avg=14.533607482910156
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.35136836767196655, max_abs=2.75, mean_rel=0.13683879375457764, max_rel=774.0670166015625, norm_rel=0.021687466651201248, ref_abs_avg=16.159770965576172, test_abs_avg=16.158981323242188
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3447304964065552, max_abs=2.625, mean_rel=0.1406894326210022, max_rel=347.35546875, norm_rel=0.021906446665525436, ref_abs_avg=15.725704193115234, test_abs_avg=15.728160858154297
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.2665797472000122, max_abs=1.25, mean_rel=0.1087932288646698, max_rel=8.853170394897461, norm_rel=0.020228080451488495, ref_abs_avg=13.35358715057373, test_abs_avg=13.353281021118164
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3309684991836548, max_abs=2.5, mean_rel=0.13443787395954132, max_rel=687.1018676757812, norm_rel=0.021383536979556084, ref_abs_avg=15.475139617919922, test_abs_avg=15.474660873413086
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.32788074016571045, max_abs=2.5, mean_rel=0.13526254892349243, max_rel=523.40869140625, norm_rel=0.021186525002121925, ref_abs_avg=15.467032432556152, test_abs_avg=15.466859817504883
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.2983860969543457, max_abs=1.125, mean_rel=0.07262548804283142, max_rel=3.0572619438171387, norm_rel=0.021842189133167267, ref_abs_avg=13.402661323547363, test_abs_avg=13.429910659790039
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.375387966632843, max_abs=2.5, mean_rel=0.14808934926986694, max_rel=611.7772216796875, norm_rel=0.023261072114109993, ref_abs_avg=16.168275833129883, test_abs_avg=16.169170379638672
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.36758726835250854, max_abs=2.5, mean_rel=0.1438131332397461, max_rel=452.1934814453125, norm_rel=0.023087961599230766, ref_abs_avg=15.932918548583984, test_abs_avg=15.931586265563965
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.2784433364868164, max_abs=1.25, mean_rel=0.07387125492095947, max_rel=10.63294506072998, norm_rel=0.022295592352747917, ref_abs_avg=13.234760284423828, test_abs_avg=13.224546432495117
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3514876365661621, max_abs=2.75, mean_rel=0.14400270581245422, max_rel=731.3722534179688, norm_rel=0.022760745137929916, ref_abs_avg=15.43704605102539, test_abs_avg=15.438028335571289
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.344965398311615, max_abs=2.5, mean_rel=0.14325883984565735, max_rel=492.5002746582031, norm_rel=0.022659851238131523, ref_abs_avg=15.251541137695312, test_abs_avg=15.25248908996582
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.28518885374069214, max_abs=1.375, mean_rel=0.08858144283294678, max_rel=5.502229690551758, norm_rel=0.023449620231986046, ref_abs_avg=12.196866989135742, test_abs_avg=12.19361686706543
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.32507818937301636, max_abs=2.5, mean_rel=0.13656195998191833, max_rel=758.1618041992188, norm_rel=0.022076262161135674, ref_abs_avg=14.734294891357422, test_abs_avg=14.735403060913086
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.31650760769844055, max_abs=2.875, mean_rel=0.1296456754207611, max_rel=558.6161499023438, norm_rel=0.021910294890403748, ref_abs_avg=14.464713096618652, test_abs_avg=14.462465286254883
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2514481544494629, max_abs=0.9375, mean_rel=0.08678177744150162, max_rel=11.710877418518066, norm_rel=0.021514039486646652, ref_abs_avg=11.964437484741211, test_abs_avg=11.985929489135742
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.29945310950279236, max_abs=2.875, mean_rel=0.13190816342830658, max_rel=651.3800659179688, norm_rel=0.021505527198314667, ref_abs_avg=13.947547912597656, test_abs_avg=13.947900772094727
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.2911316454410553, max_abs=2.75, mean_rel=0.13317926228046417, max_rel=338.2244567871094, norm_rel=0.02088852971792221, ref_abs_avg=13.978538513183594, test_abs_avg=13.970452308654785
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.2394852638244629, max_abs=0.875, mean_rel=0.08146718144416809, max_rel=5.15779972076416, norm_rel=0.020879756659269333, ref_abs_avg=11.549566268920898, test_abs_avg=11.55423355102539
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.28522172570228577, max_abs=2.62109375, mean_rel=0.12865722179412842, max_rel=871.3289184570312, norm_rel=0.020947325974702835, ref_abs_avg=13.658564567565918, test_abs_avg=13.65835189819336
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.2707936763763428, max_abs=2.25, mean_rel=0.12208689004182816, max_rel=583.1900024414062, norm_rel=0.020280752331018448, ref_abs_avg=13.330177307128906, test_abs_avg=13.327661514282227
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.21161723136901855, max_abs=0.875, mean_rel=0.05830042064189911, max_rel=2.2207610607147217, norm_rel=0.019386157393455505, ref_abs_avg=11.015790939331055, test_abs_avg=11.02048397064209
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.259792685508728, max_abs=3.25, mean_rel=0.12401086091995239, max_rel=962.2448120117188, norm_rel=0.020248519256711006, ref_abs_avg=12.911964416503906, test_abs_avg=12.911077499389648
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.25629109144210815, max_abs=2.125, mean_rel=0.11476930975914001, max_rel=348.46624755859375, norm_rel=0.019792599603533745, ref_abs_avg=12.991560935974121, test_abs_avg=12.994739532470703
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.20827731490135193, max_abs=0.78125, mean_rel=0.3088628053665161, max_rel=71.21927642822266, norm_rel=0.01905430108308792, ref_abs_avg=10.871389389038086, test_abs_avg=10.877717018127441
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.24266251921653748, max_abs=3.375, mean_rel=0.1157568097114563, max_rel=307.9455261230469, norm_rel=0.01953381672501564, ref_abs_avg=12.559805870056152, test_abs_avg=12.560331344604492
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.23713764548301697, max_abs=2.125, mean_rel=0.11773388087749481, max_rel=507.3105163574219, norm_rel=0.019181905314326286, ref_abs_avg=12.483104705810547, test_abs_avg=12.490697860717773
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.1932021975517273, max_abs=0.9375, mean_rel=0.07591329514980316, max_rel=6.341193675994873, norm_rel=0.01916508749127388, ref_abs_avg=10.407535552978516, test_abs_avg=10.40371322631836
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.2315489649772644, max_abs=2.41796875, mean_rel=0.12168953567743301, max_rel=460.31884765625, norm_rel=0.01930070109665394, ref_abs_avg=12.166218757629395, test_abs_avg=12.166557312011719
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.22922348976135254, max_abs=2.328125, mean_rel=0.11995293200016022, max_rel=878.4552612304688, norm_rel=0.019138922914862633, ref_abs_avg=12.164737701416016, test_abs_avg=12.159709930419922
liger_forward vs paper_forward output: mean_abs=0.00014548572653438896, max_abs=0.03125
liger_forward grad[0] vs paper_forward: mean_abs=0.0034358638804405928, max_abs=0.21875, mean_rel=0.025366216897964478, max_rel=86.20480346679688, norm_rel=0.009588065557181835, ref_abs_avg=0.44614535570144653, test_abs_avg=0.4461209177970886
liger_forward grad[1] vs paper_forward: mean_abs=1.5030238628387451, max_abs=16.0, mean_rel=0.06391303241252899, max_rel=192.16537475585938, norm_rel=0.006300443783402443, ref_abs_avg=217.13296508789062, test_abs_avg=217.1306610107422
liger_forward grad[2] vs paper_forward: mean_abs=0.27431392669677734, max_abs=1.4375, mean_rel=0.034183479845523834, max_rel=3.1618449687957764, norm_rel=0.00849912129342556, ref_abs_avg=34.60786056518555, test_abs_avg=34.596923828125
liger_forward grad[3] vs paper_forward: mean_abs=0.36685505509376526, max_abs=2.5, mean_rel=0.061314523220062256, max_rel=855.4686889648438, norm_rel=0.008682915940880775, ref_abs_avg=43.89195251464844, test_abs_avg=43.892269134521484
liger_forward grad[4] vs paper_forward: mean_abs=0.3505876064300537, max_abs=2.25, mean_rel=0.05589959770441055, max_rel=336.7179260253906, norm_rel=0.008507324382662773, ref_abs_avg=42.87574768066406, test_abs_avg=42.87615966796875
liger_forward grad[5] vs paper_forward: mean_abs=0.27752685546875, max_abs=1.015625, mean_rel=0.02142317034304142, max_rel=0.617542028427124, norm_rel=0.008080449886620045, ref_abs_avg=35.3297119140625, test_abs_avg=35.317291259765625
liger_forward grad[6] vs paper_forward: mean_abs=0.3196447789669037, max_abs=2.25, mean_rel=0.06179676949977875, max_rel=792.5608520507812, norm_rel=0.008474711328744888, ref_abs_avg=39.208553314208984, test_abs_avg=39.207733154296875
liger_forward grad[7] vs paper_forward: mean_abs=0.30669915676116943, max_abs=2.0, mean_rel=0.05702417716383934, max_rel=402.33349609375, norm_rel=0.008289682678878307, ref_abs_avg=38.53498840332031, test_abs_avg=38.535404205322266
liger_forward grad[8] vs paper_forward: mean_abs=0.2552485466003418, max_abs=1.0, mean_rel=0.03730875253677368, max_rel=1.0053915977478027, norm_rel=0.008868787437677383, ref_abs_avg=29.047752380371094, test_abs_avg=29.06572723388672
liger_forward grad[9] vs paper_forward: mean_abs=0.28892654180526733, max_abs=2.5, mean_rel=0.05537717044353485, max_rel=594.2557983398438, norm_rel=0.008379369974136353, ref_abs_avg=35.92656326293945, test_abs_avg=35.926239013671875
liger_forward grad[10] vs paper_forward: mean_abs=0.27825433015823364, max_abs=2.0, mean_rel=0.0532442070543766, max_rel=512.2155151367188, norm_rel=0.008191164582967758, ref_abs_avg=35.457969665527344, test_abs_avg=35.457801818847656
liger_forward grad[11] vs paper_forward: mean_abs=0.21764886379241943, max_abs=1.0, mean_rel=0.07872742414474487, max_rel=18.477582931518555, norm_rel=0.009000315330922604, ref_abs_avg=25.02871322631836, test_abs_avg=25.025089263916016
liger_forward grad[12] vs paper_forward: mean_abs=0.2642476558685303, max_abs=1.75, mean_rel=0.0527828186750412, max_rel=504.33453369140625, norm_rel=0.008181742392480373, ref_abs_avg=33.68937683105469, test_abs_avg=33.68934631347656
liger_forward grad[13] vs paper_forward: mean_abs=0.2551153004169464, max_abs=1.5, mean_rel=0.05075118690729141, max_rel=301.6466979980469, norm_rel=0.00803444441407919, ref_abs_avg=33.183284759521484, test_abs_avg=33.182106018066406
liger_forward grad[14] vs paper_forward: mean_abs=0.2030353546142578, max_abs=1.0, mean_rel=0.024223795160651207, max_rel=0.8559256196022034, norm_rel=0.008206740021705627, ref_abs_avg=25.348655700683594, test_abs_avg=25.342363357543945
liger_forward grad[15] vs paper_forward: mean_abs=0.24369877576828003, max_abs=1.5, mean_rel=0.05090666189789772, max_rel=496.9947814941406, norm_rel=0.008014061488211155, ref_abs_avg=31.795820236206055, test_abs_avg=31.79643440246582
liger_forward grad[16] vs paper_forward: mean_abs=0.23548266291618347, max_abs=1.5, mean_rel=0.0554736852645874, max_rel=521.1410522460938, norm_rel=0.00788919534534216, ref_abs_avg=31.279022216796875, test_abs_avg=31.278276443481445
liger_forward grad[17] vs paper_forward: mean_abs=0.193597674369812, max_abs=0.75, mean_rel=0.04461156204342842, max_rel=7.3177409172058105, norm_rel=0.007909857667982578, ref_abs_avg=26.261348724365234, test_abs_avg=26.262344360351562
liger_forward grad[18] vs paper_forward: mean_abs=0.22998321056365967, max_abs=1.5, mean_rel=0.05025976896286011, max_rel=629.201171875, norm_rel=0.007978854700922966, ref_abs_avg=30.179981231689453, test_abs_avg=30.18014144897461
liger_forward grad[19] vs paper_forward: mean_abs=0.22033685445785522, max_abs=1.5, mean_rel=0.048260994255542755, max_rel=389.5520324707031, norm_rel=0.007809133268892765, ref_abs_avg=29.57010269165039, test_abs_avg=29.569353103637695
liger_forward grad[20] vs paper_forward: mean_abs=0.1759854555130005, max_abs=0.75, mean_rel=0.09041056036949158, max_rel=31.392013549804688, norm_rel=0.008080882951617241, ref_abs_avg=23.353086471557617, test_abs_avg=23.373687744140625
liger_forward grad[21] vs paper_forward: mean_abs=0.21441853046417236, max_abs=1.5, mean_rel=0.04780910164117813, max_rel=518.2257690429688, norm_rel=0.007812149357050657, ref_abs_avg=28.789886474609375, test_abs_avg=28.789827346801758
liger_forward grad[22] vs paper_forward: mean_abs=0.2073376625776291, max_abs=1.5, mean_rel=0.053806085139513016, max_rel=447.3649597167969, norm_rel=0.007646182086318731, ref_abs_avg=28.49123764038086, test_abs_avg=28.490482330322266
liger_forward grad[23] vs paper_forward: mean_abs=0.16502904891967773, max_abs=0.75, mean_rel=0.03987034410238266, max_rel=4.411881446838379, norm_rel=0.007587968371808529, ref_abs_avg=23.471450805664062, test_abs_avg=23.470298767089844
liger_forward grad[24] vs paper_forward: mean_abs=0.2030518352985382, max_abs=1.375, mean_rel=0.04790477827191353, max_rel=272.1305236816406, norm_rel=0.0076853614300489426, ref_abs_avg=27.709300994873047, test_abs_avg=27.709369659423828
liger_forward grad[25] vs paper_forward: mean_abs=0.19491754472255707, max_abs=1.5, mean_rel=0.05615641921758652, max_rel=403.70098876953125, norm_rel=0.007507335860282183, ref_abs_avg=27.332847595214844, test_abs_avg=27.33339500427246
liger_forward grad[26] vs paper_forward: mean_abs=0.19098639488220215, max_abs=0.75, mean_rel=0.08499003946781158, max_rel=16.755582809448242, norm_rel=0.007940521463751793, ref_abs_avg=24.69881248474121, test_abs_avg=24.703811645507812
liger_forward grad[27] vs paper_forward: mean_abs=0.22164157032966614, max_abs=1.5, mean_rel=0.05230463668704033, max_rel=379.4283752441406, norm_rel=0.007928785867989063, ref_abs_avg=29.237552642822266, test_abs_avg=29.237106323242188
liger_forward grad[28] vs paper_forward: mean_abs=0.21415932476520538, max_abs=1.5, mean_rel=0.048684339970350266, max_rel=346.8150939941406, norm_rel=0.00780864991247654, ref_abs_avg=28.820735931396484, test_abs_avg=28.8210391998291
liger_forward grad[29] vs paper_forward: mean_abs=0.16415023803710938, max_abs=0.75, mean_rel=0.01843397133052349, max_rel=0.46236497163772583, norm_rel=0.007362948730587959, ref_abs_avg=23.081195831298828, test_abs_avg=23.089149475097656
liger_forward grad[30] vs paper_forward: mean_abs=0.1982085406780243, max_abs=1.5, mean_rel=0.0494203194975853, max_rel=417.1809387207031, norm_rel=0.007700205314904451, ref_abs_avg=26.97092056274414, test_abs_avg=26.970247268676758
liger_forward grad[31] vs paper_forward: mean_abs=0.1899581253528595, max_abs=1.25, mean_rel=0.042590055614709854, max_rel=147.45228576660156, norm_rel=0.007582777179777622, ref_abs_avg=26.347606658935547, test_abs_avg=26.347421646118164
liger_forward grad[32] vs paper_forward: mean_abs=0.15182971954345703, max_abs=0.75, mean_rel=0.02580070309340954, max_rel=2.426743268966675, norm_rel=0.008087350986897945, ref_abs_avg=20.52486228942871, test_abs_avg=20.534549713134766
liger_forward grad[33] vs paper_forward: mean_abs=0.18467773497104645, max_abs=1.5, mean_rel=0.0517372228205204, max_rel=569.27099609375, norm_rel=0.007648961152881384, ref_abs_avg=25.366289138793945, test_abs_avg=25.366039276123047
liger_forward grad[34] vs paper_forward: mean_abs=0.17603223025798798, max_abs=1.0625, mean_rel=0.0463259257376194, max_rel=257.53765869140625, norm_rel=0.007450790144503117, ref_abs_avg=24.845611572265625, test_abs_avg=24.845149993896484
liger_forward grad[35] vs paper_forward: mean_abs=0.1449439525604248, max_abs=0.5390625, mean_rel=0.05728225037455559, max_rel=15.38094711303711, norm_rel=0.007365631405264139, ref_abs_avg=20.53549575805664, test_abs_avg=20.533586502075195
liger_forward grad[36] vs paper_forward: mean_abs=0.16816963255405426, max_abs=1.0, mean_rel=0.04692879319190979, max_rel=354.6116638183594, norm_rel=0.007432394195348024, ref_abs_avg=23.78932762145996, test_abs_avg=23.78917694091797
liger_forward grad[37] vs paper_forward: mean_abs=0.1622089445590973, max_abs=1.25, mean_rel=0.04413135349750519, max_rel=241.23080444335938, norm_rel=0.00731820659711957, ref_abs_avg=23.34163475036621, test_abs_avg=23.34128189086914
liger_forward grad[38] vs paper_forward: mean_abs=0.13319730758666992, max_abs=0.625, mean_rel=0.034287940710783005, max_rel=2.3078174591064453, norm_rel=0.0072731804102659225, ref_abs_avg=18.93120574951172, test_abs_avg=18.929595947265625
liger_forward grad[39] vs paper_forward: mean_abs=0.1571306586265564, max_abs=1.0, mean_rel=0.043193940073251724, max_rel=270.1410217285156, norm_rel=0.007285160943865776, ref_abs_avg=22.700870513916016, test_abs_avg=22.700992584228516
liger_forward grad[40] vs paper_forward: mean_abs=0.15168005228042603, max_abs=1.0, mean_rel=0.04368743300437927, max_rel=231.591552734375, norm_rel=0.007130420301109552, ref_abs_avg=22.46453857421875, test_abs_avg=22.4644775390625
liger_forward grad[41] vs paper_forward: mean_abs=0.11716204881668091, max_abs=0.5625, mean_rel=0.06566909700632095, max_rel=20.57461166381836, norm_rel=0.007308740634471178, ref_abs_avg=17.604896545410156, test_abs_avg=17.605628967285156
liger_forward grad[42] vs paper_forward: mean_abs=0.1478673815727234, max_abs=1.0, mean_rel=0.04518469423055649, max_rel=383.5179748535156, norm_rel=0.00718793785199523, ref_abs_avg=21.683080673217773, test_abs_avg=21.683452606201172
liger_forward grad[43] vs paper_forward: mean_abs=0.14201901853084564, max_abs=1.0, mean_rel=0.04280603677034378, max_rel=302.8645324707031, norm_rel=0.006968621164560318, ref_abs_avg=21.58945655822754, test_abs_avg=21.589920043945312
liger_forward grad[44] vs paper_forward: mean_abs=0.11601638793945312, max_abs=0.5, mean_rel=0.022142289206385612, max_rel=0.8347569108009338, norm_rel=0.007185765076428652, ref_abs_avg=16.991310119628906, test_abs_avg=16.98781967163086
liger_forward grad[45] vs paper_forward: mean_abs=0.139835923910141, max_abs=1.0, mean_rel=0.044319041073322296, max_rel=297.6817321777344, norm_rel=0.007063065655529499, ref_abs_avg=20.876461029052734, test_abs_avg=20.87627601623535
liger_forward grad[46] vs paper_forward: mean_abs=0.1352616846561432, max_abs=1.0, mean_rel=0.0425577238202095, max_rel=268.09649658203125, norm_rel=0.0069703347980976105, ref_abs_avg=20.56955337524414, test_abs_avg=20.56987953186035
liger_forward grad[47] vs paper_forward: mean_abs=0.11278748512268066, max_abs=0.5, mean_rel=0.020957477390766144, max_rel=0.7096408009529114, norm_rel=0.006843384355306625, ref_abs_avg=17.271747589111328, test_abs_avg=17.27000617980957
liger_forward grad[48] vs paper_forward: mean_abs=0.13235127925872803, max_abs=1.0, mean_rel=0.0438237339258194, max_rel=247.65843200683594, norm_rel=0.006933282129466534, ref_abs_avg=20.1634578704834, test_abs_avg=20.16312026977539
liger_forward grad[49] vs paper_forward: mean_abs=0.12817847728729248, max_abs=1.0, mean_rel=0.041901275515556335, max_rel=188.98114013671875, norm_rel=0.0067946030758321285, ref_abs_avg=20.000934600830078, test_abs_avg=20.00238037109375
liger_forward grad[50] vs paper_forward: mean_abs=0.11973929405212402, max_abs=0.5, mean_rel=0.033581458032131195, max_rel=2.225929021835327, norm_rel=0.007526276167482138, ref_abs_avg=17.372665405273438, test_abs_avg=17.37856674194336
liger_forward grad[51] vs paper_forward: mean_abs=0.1492750346660614, max_abs=1.0, mean_rel=0.045474614948034286, max_rel=239.4606475830078, norm_rel=0.007419525180011988, ref_abs_avg=21.167583465576172, test_abs_avg=21.167139053344727
liger_forward grad[52] vs paper_forward: mean_abs=0.1443389654159546, max_abs=1.0, mean_rel=0.04648296535015106, max_rel=168.93716430664062, norm_rel=0.007248234935104847, ref_abs_avg=21.039630889892578, test_abs_avg=21.03830337524414
liger_forward grad[53] vs paper_forward: mean_abs=0.10940027236938477, max_abs=0.5, mean_rel=0.034930065274238586, max_rel=3.078761339187622, norm_rel=0.007300233468413353, ref_abs_avg=15.749303817749023, test_abs_avg=15.753026008605957
liger_forward grad[54] vs paper_forward: mean_abs=0.13672472536563873, max_abs=1.0, mean_rel=0.043546952307224274, max_rel=174.5267333984375, norm_rel=0.007192965596914291, ref_abs_avg=19.983936309814453, test_abs_avg=19.98404312133789
liger_forward grad[55] vs paper_forward: mean_abs=0.13241028785705566, max_abs=1.0625, mean_rel=0.04305754601955414, max_rel=259.143310546875, norm_rel=0.0070771947503089905, ref_abs_avg=19.75372886657715, test_abs_avg=19.75409698486328
liger_forward grad[56] vs paper_forward: mean_abs=0.11223568022251129, max_abs=0.5, mean_rel=0.22057677805423737, max_rel=93.41442108154297, norm_rel=0.007665428798645735, ref_abs_avg=15.12527084350586, test_abs_avg=15.141921997070312
liger_forward grad[57] vs paper_forward: mean_abs=0.12680339813232422, max_abs=1.0, mean_rel=0.04472244530916214, max_rel=256.2452087402344, norm_rel=0.0071340566501021385, ref_abs_avg=18.735370635986328, test_abs_avg=18.735065460205078
liger_forward grad[58] vs paper_forward: mean_abs=0.12320858240127563, max_abs=1.0, mean_rel=0.04689355194568634, max_rel=294.7950744628906, norm_rel=0.007030038628727198, ref_abs_avg=18.54763412475586, test_abs_avg=18.54865264892578
liger_forward grad[59] vs paper_forward: mean_abs=0.09572434425354004, max_abs=0.4375, mean_rel=0.06616391986608505, max_rel=25.414127349853516, norm_rel=0.006604943890124559, ref_abs_avg=15.525550842285156, test_abs_avg=15.526815414428711
liger_forward grad[60] vs paper_forward: mean_abs=0.11803488433361053, max_abs=1.0, mean_rel=0.04265403747558594, max_rel=209.69932556152344, norm_rel=0.006987665314227343, ref_abs_avg=17.805152893066406, test_abs_avg=17.805177688598633
liger_forward grad[61] vs paper_forward: mean_abs=0.11391372978687286, max_abs=0.75, mean_rel=0.039517953991889954, max_rel=111.71744537353516, norm_rel=0.006742329802364111, ref_abs_avg=17.914180755615234, test_abs_avg=17.915790557861328
liger_forward grad[62] vs paper_forward: mean_abs=0.08160543441772461, max_abs=0.375, mean_rel=0.027915701270103455, max_rel=2.5716171264648438, norm_rel=0.0061377184465527534, ref_abs_avg=14.265851974487305, test_abs_avg=14.262219429016113
liger_forward grad[63] vs paper_forward: mean_abs=0.11079223453998566, max_abs=1.0, mean_rel=0.03965546190738678, max_rel=192.02700805664062, norm_rel=0.006766314618289471, ref_abs_avg=17.302536010742188, test_abs_avg=17.30263900756836
liger_forward grad[64] vs paper_forward: mean_abs=0.10708638280630112, max_abs=1.0, mean_rel=0.04053906351327896, max_rel=197.80517578125, norm_rel=0.006586697418242693, ref_abs_avg=17.250995635986328, test_abs_avg=17.252151489257812
liger_forward grad[65] vs paper_forward: mean_abs=0.09073938429355621, max_abs=0.4375, mean_rel=0.2520086467266083, max_rel=102.63800048828125, norm_rel=0.006960740778595209, ref_abs_avg=13.748781204223633, test_abs_avg=13.747314453125
liger_forward grad[66] vs paper_forward: mean_abs=0.10566059499979019, max_abs=0.75, mean_rel=0.03973956033587456, max_rel=150.54122924804688, norm_rel=0.0066922735422849655, ref_abs_avg=16.734508514404297, test_abs_avg=16.73446273803711
liger_forward grad[67] vs paper_forward: mean_abs=0.10251521319150925, max_abs=1.0, mean_rel=0.038642026484012604, max_rel=189.96983337402344, norm_rel=0.00650258781388402, ref_abs_avg=16.745676040649414, test_abs_avg=16.74606704711914
liger_forward grad[68] vs paper_forward: mean_abs=0.08302950859069824, max_abs=0.375, mean_rel=0.018363619223237038, max_rel=1.1989136934280396, norm_rel=0.0063268570229411125, ref_abs_avg=14.521284103393555, test_abs_avg=14.520318031311035
liger_forward grad[69] vs paper_forward: mean_abs=0.0996774211525917, max_abs=0.75, mean_rel=0.03900940716266632, max_rel=155.6009063720703, norm_rel=0.006533498410135508, ref_abs_avg=16.159770965576172, test_abs_avg=16.15985870361328
liger_forward grad[70] vs paper_forward: mean_abs=0.09624387323856354, max_abs=0.875, mean_rel=0.04108039289712906, max_rel=188.91116333007812, norm_rel=0.006519923452287912, ref_abs_avg=15.725704193115234, test_abs_avg=15.725441932678223
liger_forward grad[71] vs paper_forward: mean_abs=0.07544004917144775, max_abs=0.296875, mean_rel=0.04944396764039993, max_rel=6.14660120010376, norm_rel=0.00605688476935029, ref_abs_avg=13.35358715057373, test_abs_avg=13.35780143737793
liger_forward grad[72] vs paper_forward: mean_abs=0.09389320015907288, max_abs=1.0, mean_rel=0.037962932139635086, max_rel=225.6997528076172, norm_rel=0.00645222095772624, ref_abs_avg=15.475139617919922, test_abs_avg=15.474995613098145
liger_forward grad[73] vs paper_forward: mean_abs=0.0915551632642746, max_abs=1.0, mean_rel=0.038743678480386734, max_rel=118.09088897705078, norm_rel=0.006338456645607948, ref_abs_avg=15.467032432556152, test_abs_avg=15.468040466308594
liger_forward grad[74] vs paper_forward: mean_abs=0.09262657165527344, max_abs=0.375, mean_rel=0.02283278852701187, max_rel=0.8463659286499023, norm_rel=0.007060607895255089, ref_abs_avg=13.402661323547363, test_abs_avg=13.402098655700684
liger_forward grad[75] vs paper_forward: mean_abs=0.10995286703109741, max_abs=1.0, mean_rel=0.04460935294628143, max_rel=169.6084442138672, norm_rel=0.007179523818194866, ref_abs_avg=16.168275833129883, test_abs_avg=16.16827964782715
liger_forward grad[76] vs paper_forward: mean_abs=0.10460986196994781, max_abs=0.75, mean_rel=0.041302911937236786, max_rel=140.90403747558594, norm_rel=0.00694755045697093, ref_abs_avg=15.932918548583984, test_abs_avg=15.932869911193848
liger_forward grad[77] vs paper_forward: mean_abs=0.08245420455932617, max_abs=0.375, mean_rel=0.023591795936226845, max_rel=3.6018290519714355, norm_rel=0.006764369085431099, ref_abs_avg=13.234760284423828, test_abs_avg=13.237435340881348
liger_forward grad[78] vs paper_forward: mean_abs=0.10213543474674225, max_abs=0.75, mean_rel=0.04220924153923988, max_rel=359.4210510253906, norm_rel=0.006983466446399689, ref_abs_avg=15.43704605102539, test_abs_avg=15.437287330627441
liger_forward grad[79] vs paper_forward: mean_abs=0.09774550795555115, max_abs=0.75, mean_rel=0.03891395032405853, max_rel=78.79354095458984, norm_rel=0.0067952778190374374, ref_abs_avg=15.251541137695312, test_abs_avg=15.253100395202637
liger_forward grad[80] vs paper_forward: mean_abs=0.07788845896720886, max_abs=0.296875, mean_rel=0.023128680884838104, max_rel=1.010657787322998, norm_rel=0.006634246092289686, ref_abs_avg=12.196866989135742, test_abs_avg=12.192424774169922
liger_forward grad[81] vs paper_forward: mean_abs=0.09453506022691727, max_abs=0.75, mean_rel=0.04013058543205261, max_rel=205.3446502685547, norm_rel=0.006793274078518152, ref_abs_avg=14.734294891357422, test_abs_avg=14.734105110168457
liger_forward grad[82] vs paper_forward: mean_abs=0.08984778821468353, max_abs=0.75, mean_rel=0.03677087277173996, max_rel=107.69011688232422, norm_rel=0.006611031945794821, ref_abs_avg=14.464713096618652, test_abs_avg=14.465222358703613
liger_forward grad[83] vs paper_forward: mean_abs=0.06916618347167969, max_abs=0.375, mean_rel=0.018311239778995514, max_rel=0.7464442253112793, norm_rel=0.006493386346846819, ref_abs_avg=11.964437484741211, test_abs_avg=11.965543746948242
liger_forward grad[84] vs paper_forward: mean_abs=0.08644641935825348, max_abs=1.0, mean_rel=0.03856252133846283, max_rel=145.79908752441406, norm_rel=0.006587508600205183, ref_abs_avg=13.947547912597656, test_abs_avg=13.947270393371582
liger_forward grad[85] vs paper_forward: mean_abs=0.08449287712574005, max_abs=0.75, mean_rel=0.037782587110996246, max_rel=88.26981353759766, norm_rel=0.006465607788413763, ref_abs_avg=13.978538513183594, test_abs_avg=13.97719955444336
liger_forward grad[86] vs paper_forward: mean_abs=0.07045483589172363, max_abs=0.375, mean_rel=0.022784702479839325, max_rel=2.5356178283691406, norm_rel=0.0066163018345832825, ref_abs_avg=11.549566268920898, test_abs_avg=11.540925025939941
liger_forward grad[87] vs paper_forward: mean_abs=0.08206614851951599, max_abs=0.75, mean_rel=0.036445558071136475, max_rel=215.65463256835938, norm_rel=0.0064115943387150764, ref_abs_avg=13.658564567565918, test_abs_avg=13.658926963806152
liger_forward grad[88] vs paper_forward: mean_abs=0.0799725204706192, max_abs=0.75, mean_rel=0.03683122992515564, max_rel=102.29405975341797, norm_rel=0.006390836089849472, ref_abs_avg=13.330177307128906, test_abs_avg=13.329755783081055
liger_forward grad[89] vs paper_forward: mean_abs=0.06414628028869629, max_abs=0.25, mean_rel=0.01746240258216858, max_rel=0.825237512588501, norm_rel=0.006328655872493982, ref_abs_avg=11.015790939331055, test_abs_avg=11.013349533081055
liger_forward grad[90] vs paper_forward: mean_abs=0.07457252591848373, max_abs=1.0, mean_rel=0.03485208377242088, max_rel=149.6173095703125, norm_rel=0.006204929202795029, ref_abs_avg=12.911964416503906, test_abs_avg=12.91192626953125
liger_forward grad[91] vs paper_forward: mean_abs=0.07289470732212067, max_abs=0.75, mean_rel=0.03275054693222046, max_rel=143.61318969726562, norm_rel=0.006081209983676672, ref_abs_avg=12.991560935974121, test_abs_avg=12.992557525634766
liger_forward grad[92] vs paper_forward: mean_abs=0.06035161018371582, max_abs=0.25, mean_rel=0.06303007900714874, max_rel=10.490331649780273, norm_rel=0.005813146475702524, ref_abs_avg=10.871389389038086, test_abs_avg=10.868155479431152
liger_forward grad[93] vs paper_forward: mean_abs=0.06964853405952454, max_abs=1.0, mean_rel=0.03378624469041824, max_rel=101.75177001953125, norm_rel=0.0060317907482385635, ref_abs_avg=12.559805870056152, test_abs_avg=12.559978485107422
liger_forward grad[94] vs paper_forward: mean_abs=0.06769586354494095, max_abs=0.75, mean_rel=0.032991182059049606, max_rel=70.38618469238281, norm_rel=0.005900859832763672, ref_abs_avg=12.483104705810547, test_abs_avg=12.48229694366455
liger_forward grad[95] vs paper_forward: mean_abs=0.05303335189819336, max_abs=0.25, mean_rel=0.01947447843849659, max_rel=0.5908016562461853, norm_rel=0.0056581939570605755, ref_abs_avg=10.407535552978516, test_abs_avg=10.408820152282715
liger_forward grad[96] vs paper_forward: mean_abs=0.06612983345985413, max_abs=1.0, mean_rel=0.03579956665635109, max_rel=165.12425231933594, norm_rel=0.005943900905549526, ref_abs_avg=12.166218757629395, test_abs_avg=12.166120529174805
liger_forward grad[97] vs paper_forward: mean_abs=0.0647238940000534, max_abs=1.0, mean_rel=0.03300542011857033, max_rel=111.77225494384766, norm_rel=0.005853358190506697, ref_abs_avg=12.164737701416016, test_abs_avg=12.164263725280762
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  48.535 ms
torch_compile_phases_forward bwd-only: 39.325 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.754 GiB
paper_forward fwd+bwd:  112.768 ms
paper_forward bwd-only: 88.959 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
liger_forward fwd+bwd:  45.677 ms
liger_forward bwd-only: 33.318 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
production_forward fwd+bwd:  33.802 ms
production_forward bwd-only: 28.846 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.240 GiB, fwd+bwd=5.240 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001693776692263782, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008623111993074417, max_abs=0.375, mean_rel=0.07287344336509705, max_rel=98.67727661132812, norm_rel=0.019966285675764084, ref_abs_avg=0.47036439180374146, test_abs_avg=0.4703798294067383
production_forward grad[1] vs paper_forward: mean_abs=5.413717746734619, max_abs=47.0, mean_rel=0.2789081633090973, max_rel=1939.6832275390625, norm_rel=0.020858047530055046, ref_abs_avg=231.82565307617188, test_abs_avg=231.9187469482422
production_forward grad[2] vs paper_forward: mean_abs=0.832064151763916, max_abs=3.5, mean_rel=0.10662024468183517, max_rel=8.633633613586426, norm_rel=0.021930966526269913, ref_abs_avg=38.67307662963867, test_abs_avg=38.595603942871094
production_forward grad[3] vs paper_forward: mean_abs=1.078941822052002, max_abs=7.125, mean_rel=0.16266277432441711, max_rel=2287.3720703125, norm_rel=0.023117512464523315, ref_abs_avg=46.894493103027344, test_abs_avg=46.89729309082031
production_forward grad[4] vs paper_forward: mean_abs=1.0596410036087036, max_abs=6.5, mean_rel=0.16935567557811737, max_rel=1377.561279296875, norm_rel=0.022863442078232765, ref_abs_avg=46.56688690185547, test_abs_avg=46.56617736816406
production_forward grad[5] vs paper_forward: mean_abs=0.7863936424255371, max_abs=3.25, mean_rel=0.10553856939077377, max_rel=15.826299667358398, norm_rel=0.022282054647803307, ref_abs_avg=36.11885070800781, test_abs_avg=36.0291748046875
production_forward grad[6] vs paper_forward: mean_abs=0.9699997305870056, max_abs=6.5, mean_rel=0.16407762467861176, max_rel=1605.8031005859375, norm_rel=0.022943023592233658, ref_abs_avg=42.49825668334961, test_abs_avg=42.498992919921875
production_forward grad[7] vs paper_forward: mean_abs=0.946007251739502, max_abs=6.375, mean_rel=0.1453532874584198, max_rel=1075.0177001953125, norm_rel=0.022704368457198143, ref_abs_avg=41.88818359375, test_abs_avg=41.891448974609375
production_forward grad[8] vs paper_forward: mean_abs=0.7340812683105469, max_abs=2.75, mean_rel=0.10852093994617462, max_rel=6.289504528045654, norm_rel=0.02379259280860424, ref_abs_avg=31.03903579711914, test_abs_avg=31.07823944091797
production_forward grad[9] vs paper_forward: mean_abs=0.8813992738723755, max_abs=5.078125, mean_rel=0.16585128009319305, max_rel=1785.9130859375, norm_rel=0.02286062389612198, ref_abs_avg=38.770042419433594, test_abs_avg=38.77347946166992
production_forward grad[10] vs paper_forward: mean_abs=0.856842041015625, max_abs=5.140625, mean_rel=0.14627310633659363, max_rel=576.0772705078125, norm_rel=0.022447772324085236, ref_abs_avg=38.3868408203125, test_abs_avg=38.391761779785156
production_forward grad[11] vs paper_forward: mean_abs=0.6395797729492188, max_abs=2.5, mean_rel=0.07822948694229126, max_rel=3.0310006141662598, norm_rel=0.021783865988254547, ref_abs_avg=30.499454498291016, test_abs_avg=30.510440826416016
production_forward grad[12] vs paper_forward: mean_abs=0.8149061799049377, max_abs=5.15625, mean_rel=0.1546798050403595, max_rel=1518.1563720703125, norm_rel=0.022554952651262283, ref_abs_avg=36.29523849487305, test_abs_avg=36.29808044433594
production_forward grad[13] vs paper_forward: mean_abs=0.8018854856491089, max_abs=5.25, mean_rel=0.16124936938285828, max_rel=2388.97998046875, norm_rel=0.0225839763879776, ref_abs_avg=35.69153594970703, test_abs_avg=35.69502258300781
production_forward grad[14] vs paper_forward: mean_abs=0.6497125625610352, max_abs=2.5, mean_rel=0.10255417972803116, max_rel=4.9878950119018555, norm_rel=0.022609954699873924, ref_abs_avg=28.313800811767578, test_abs_avg=28.348491668701172
production_forward grad[15] vs paper_forward: mean_abs=0.7557686567306519, max_abs=4.625, mean_rel=0.1585717499256134, max_rel=857.8793334960938, norm_rel=0.02246774546802044, ref_abs_avg=33.81970977783203, test_abs_avg=33.82118225097656
production_forward grad[16] vs paper_forward: mean_abs=0.7389975786209106, max_abs=4.75, mean_rel=0.13934199512004852, max_rel=489.4330139160156, norm_rel=0.022075211629271507, ref_abs_avg=33.6298828125, test_abs_avg=33.63188934326172
production_forward grad[17] vs paper_forward: mean_abs=0.5982999801635742, max_abs=2.53125, mean_rel=0.09760186821222305, max_rel=4.351015567779541, norm_rel=0.024662038311362267, ref_abs_avg=24.561782836914062, test_abs_avg=24.575754165649414
production_forward grad[18] vs paper_forward: mean_abs=0.7137331962585449, max_abs=4.5, mean_rel=0.1584016978740692, max_rel=919.0038452148438, norm_rel=0.022252622991800308, ref_abs_avg=32.225006103515625, test_abs_avg=32.224647521972656
production_forward grad[19] vs paper_forward: mean_abs=0.7005107402801514, max_abs=4.125, mean_rel=0.1363741159439087, max_rel=487.46649169921875, norm_rel=0.022164801135659218, ref_abs_avg=31.73087501525879, test_abs_avg=31.73013687133789
production_forward grad[20] vs paper_forward: mean_abs=0.5466398000717163, max_abs=2.125, mean_rel=0.2392691820859909, max_rel=54.249267578125, norm_rel=0.021271083503961563, ref_abs_avg=25.682056427001953, test_abs_avg=25.677669525146484
production_forward grad[21] vs paper_forward: mean_abs=0.6810181140899658, max_abs=4.25, mean_rel=0.1495668888092041, max_rel=842.3333129882812, norm_rel=0.022303607314825058, ref_abs_avg=30.71664047241211, test_abs_avg=30.71788215637207
production_forward grad[22] vs paper_forward: mean_abs=0.6633514165878296, max_abs=4.0, mean_rel=0.1550341248512268, max_rel=1056.6405029296875, norm_rel=0.022123297676444054, ref_abs_avg=30.203857421875, test_abs_avg=30.20203399658203
production_forward grad[23] vs paper_forward: mean_abs=0.5451377630233765, max_abs=2.0, mean_rel=0.23521137237548828, max_rel=57.24455261230469, norm_rel=0.02204110659658909, ref_abs_avg=24.451862335205078, test_abs_avg=24.399106979370117
production_forward grad[24] vs paper_forward: mean_abs=0.6463004350662231, max_abs=3.796875, mean_rel=0.1446363925933838, max_rel=1085.619140625, norm_rel=0.022177377715706825, ref_abs_avg=29.267709732055664, test_abs_avg=29.267070770263672
production_forward grad[25] vs paper_forward: mean_abs=0.6329082250595093, max_abs=4.5, mean_rel=0.1511279046535492, max_rel=861.8772583007812, norm_rel=0.02196819707751274, ref_abs_avg=28.952396392822266, test_abs_avg=28.952434539794922
production_forward grad[26] vs paper_forward: mean_abs=0.6111745834350586, max_abs=3.0, mean_rel=0.07522071897983551, max_rel=3.2281744480133057, norm_rel=0.02330351248383522, ref_abs_avg=26.486351013183594, test_abs_avg=26.462501525878906
production_forward grad[27] vs paper_forward: mean_abs=0.7451772689819336, max_abs=4.53125, mean_rel=0.14865843951702118, max_rel=1024.31494140625, norm_rel=0.023785417899489403, ref_abs_avg=31.42788314819336, test_abs_avg=31.426490783691406
production_forward grad[28] vs paper_forward: mean_abs=0.7283912897109985, max_abs=4.375, mean_rel=0.15173892676830292, max_rel=797.6168823242188, norm_rel=0.02365591749548912, ref_abs_avg=30.89696502685547, test_abs_avg=30.902681350708008
production_forward grad[29] vs paper_forward: mean_abs=0.5737104415893555, max_abs=2.4375, mean_rel=0.06546853482723236, max_rel=2.4756758213043213, norm_rel=0.02515963837504387, ref_abs_avg=23.304880142211914, test_abs_avg=23.283306121826172
production_forward grad[30] vs paper_forward: mean_abs=0.69080650806427, max_abs=4.75, mean_rel=0.16044825315475464, max_rel=1535.5269775390625, norm_rel=0.02417382039129734, ref_abs_avg=28.68146514892578, test_abs_avg=28.681324005126953
production_forward grad[31] vs paper_forward: mean_abs=0.6850636005401611, max_abs=4.0, mean_rel=0.15631914138793945, max_rel=546.58984375, norm_rel=0.023912589997053146, ref_abs_avg=28.736083984375, test_abs_avg=28.73670196533203
production_forward grad[32] vs paper_forward: mean_abs=0.5188074111938477, max_abs=1.875, mean_rel=0.07798357307910919, max_rel=4.502346515655518, norm_rel=0.022766295820474625, ref_abs_avg=23.309696197509766, test_abs_avg=23.29640007019043
production_forward grad[33] vs paper_forward: mean_abs=0.6487979888916016, max_abs=4.2421875, mean_rel=0.15794312953948975, max_rel=1051.1771240234375, norm_rel=0.023984676226973534, ref_abs_avg=27.10666275024414, test_abs_avg=27.107315063476562
production_forward grad[34] vs paper_forward: mean_abs=0.6378803253173828, max_abs=4.4375, mean_rel=0.16935782134532928, max_rel=928.0316772460938, norm_rel=0.024052102118730545, ref_abs_avg=26.581832885742188, test_abs_avg=26.58855628967285
production_forward grad[35] vs paper_forward: mean_abs=0.4939231872558594, max_abs=2.0, mean_rel=0.0985535979270935, max_rel=4.034502029418945, norm_rel=0.02252327837049961, ref_abs_avg=21.716995239257812, test_abs_avg=21.73889923095703
production_forward grad[36] vs paper_forward: mean_abs=0.6004055142402649, max_abs=4.0, mean_rel=0.15549927949905396, max_rel=1025.6533203125, norm_rel=0.023806309327483177, ref_abs_avg=25.262340545654297, test_abs_avg=25.26101303100586
production_forward grad[37] vs paper_forward: mean_abs=0.5947871804237366, max_abs=4.0703125, mean_rel=0.16149193048477173, max_rel=683.6168212890625, norm_rel=0.023839173838496208, ref_abs_avg=25.026275634765625, test_abs_avg=25.02457046508789
production_forward grad[38] vs paper_forward: mean_abs=0.47359681129455566, max_abs=2.375, mean_rel=0.09025664627552032, max_rel=4.066248893737793, norm_rel=0.025105847045779228, ref_abs_avg=19.677589416503906, test_abs_avg=19.658796310424805
production_forward grad[39] vs paper_forward: mean_abs=0.5728563666343689, max_abs=3.75, mean_rel=0.14725589752197266, max_rel=587.7398071289062, norm_rel=0.02350921928882599, ref_abs_avg=24.436561584472656, test_abs_avg=24.437801361083984
production_forward grad[40] vs paper_forward: mean_abs=0.561445951461792, max_abs=3.9375, mean_rel=0.1499047577381134, max_rel=453.40679931640625, norm_rel=0.02344157174229622, ref_abs_avg=24.005916595458984, test_abs_avg=24.005538940429688
production_forward grad[41] vs paper_forward: mean_abs=0.42240405082702637, max_abs=1.78125, mean_rel=0.06562312692403793, max_rel=2.1550559997558594, norm_rel=0.021764567121863365, ref_abs_avg=19.99234390258789, test_abs_avg=19.987895965576172
production_forward grad[42] vs paper_forward: mean_abs=0.5468701124191284, max_abs=3.5, mean_rel=0.15199783444404602, max_rel=888.844970703125, norm_rel=0.023333368822932243, ref_abs_avg=23.465953826904297, test_abs_avg=23.46485137939453
production_forward grad[43] vs paper_forward: mean_abs=0.5376806259155273, max_abs=3.25, mean_rel=0.14867453277111053, max_rel=468.4576110839844, norm_rel=0.023172376677393913, ref_abs_avg=23.213542938232422, test_abs_avg=23.21872329711914
production_forward grad[44] vs paper_forward: mean_abs=0.4133443832397461, max_abs=1.5234375, mean_rel=0.18392571806907654, max_rel=46.09468460083008, norm_rel=0.022642627358436584, ref_abs_avg=18.385509490966797, test_abs_avg=18.34803009033203
production_forward grad[45] vs paper_forward: mean_abs=0.5228708982467651, max_abs=3.46875, mean_rel=0.14894434809684753, max_rel=623.7401733398438, norm_rel=0.023119663819670677, ref_abs_avg=22.604339599609375, test_abs_avg=22.60350799560547
production_forward grad[46] vs paper_forward: mean_abs=0.5086054801940918, max_abs=3.125, mean_rel=0.14977313578128815, max_rel=661.55322265625, norm_rel=0.0230433139950037, ref_abs_avg=22.086528778076172, test_abs_avg=22.0815372467041
production_forward grad[47] vs paper_forward: mean_abs=0.4223794937133789, max_abs=1.5, mean_rel=0.07822830975055695, max_rel=6.057792663574219, norm_rel=0.023073526099324226, ref_abs_avg=18.29098892211914, test_abs_avg=18.28494644165039
production_forward grad[48] vs paper_forward: mean_abs=0.4990331828594208, max_abs=3.109375, mean_rel=0.1526922732591629, max_rel=1116.44482421875, norm_rel=0.02307213470339775, ref_abs_avg=21.66227912902832, test_abs_avg=21.663116455078125
production_forward grad[49] vs paper_forward: mean_abs=0.4871712625026703, max_abs=3.1875, mean_rel=0.14783184230327606, max_rel=555.2857666015625, norm_rel=0.02285105176270008, ref_abs_avg=21.339550018310547, test_abs_avg=21.33979034423828
production_forward grad[50] vs paper_forward: mean_abs=0.42070043087005615, max_abs=1.625, mean_rel=0.23523104190826416, max_rel=70.59114074707031, norm_rel=0.022942813113331795, ref_abs_avg=18.659746170043945, test_abs_avg=18.64390754699707
production_forward grad[51] vs paper_forward: mean_abs=0.5440800189971924, max_abs=3.6875, mean_rel=0.1672476828098297, max_rel=752.7179565429688, norm_rel=0.024821948260068893, ref_abs_avg=21.946121215820312, test_abs_avg=21.945945739746094
production_forward grad[52] vs paper_forward: mean_abs=0.5371066927909851, max_abs=3.5, mean_rel=0.15963377058506012, max_rel=591.7245483398438, norm_rel=0.02448492869734764, ref_abs_avg=21.969829559326172, test_abs_avg=21.968395233154297
production_forward grad[53] vs paper_forward: mean_abs=0.435805082321167, max_abs=1.71875, mean_rel=0.27280697226524353, max_rel=37.75001525878906, norm_rel=0.025741908699274063, ref_abs_avg=16.77495765686035, test_abs_avg=16.761734008789062
production_forward grad[54] vs paper_forward: mean_abs=0.5050301551818848, max_abs=3.28125, mean_rel=0.15685713291168213, max_rel=1063.386474609375, norm_rel=0.024625148624181747, ref_abs_avg=20.52323341369629, test_abs_avg=20.52365493774414
production_forward grad[55] vs paper_forward: mean_abs=0.4977990388870239, max_abs=3.75, mean_rel=0.1576564610004425, max_rel=826.5042114257812, norm_rel=0.0241997167468071, ref_abs_avg=20.58697509765625, test_abs_avg=20.59238624572754
production_forward grad[56] vs paper_forward: mean_abs=0.3951835632324219, max_abs=1.49609375, mean_rel=0.10832220315933228, max_rel=8.508086204528809, norm_rel=0.02322559431195259, ref_abs_avg=16.944459915161133, test_abs_avg=16.93587303161621
production_forward grad[57] vs paper_forward: mean_abs=0.47246718406677246, max_abs=3.25, mean_rel=0.14795786142349243, max_rel=751.3873901367188, norm_rel=0.02400837279856205, ref_abs_avg=19.699697494506836, test_abs_avg=19.699569702148438
production_forward grad[58] vs paper_forward: mean_abs=0.4636349678039551, max_abs=3.5, mean_rel=0.15570348501205444, max_rel=1085.58251953125, norm_rel=0.023905355483293533, ref_abs_avg=19.443675994873047, test_abs_avg=19.443374633789062
production_forward grad[59] vs paper_forward: mean_abs=0.34766924381256104, max_abs=1.375, mean_rel=0.10432380437850952, max_rel=11.021632194519043, norm_rel=0.022845938801765442, ref_abs_avg=15.64433479309082, test_abs_avg=15.637840270996094
production_forward grad[60] vs paper_forward: mean_abs=0.44646942615509033, max_abs=3.25, mean_rel=0.15630696713924408, max_rel=952.6480712890625, norm_rel=0.023748809471726418, ref_abs_avg=18.79003143310547, test_abs_avg=18.791728973388672
production_forward grad[61] vs paper_forward: mean_abs=0.4365569055080414, max_abs=3.390625, mean_rel=0.14830130338668823, max_rel=1058.58544921875, norm_rel=0.023243945091962814, ref_abs_avg=18.802650451660156, test_abs_avg=18.807533264160156
production_forward grad[62] vs paper_forward: mean_abs=0.3452470302581787, max_abs=1.375, mean_rel=0.11489707976579666, max_rel=21.57501792907715, norm_rel=0.02463722974061966, ref_abs_avg=14.518030166625977, test_abs_avg=14.50387954711914
production_forward grad[63] vs paper_forward: mean_abs=0.4205363392829895, max_abs=3.0, mean_rel=0.1527881920337677, max_rel=781.7178344726562, norm_rel=0.023418540135025978, ref_abs_avg=17.936813354492188, test_abs_avg=17.9384708404541
production_forward grad[64] vs paper_forward: mean_abs=0.41223394870758057, max_abs=3.0, mean_rel=0.14275270700454712, max_rel=338.65875244140625, norm_rel=0.023044686764478683, ref_abs_avg=17.876873016357422, test_abs_avg=17.871788024902344
production_forward grad[65] vs paper_forward: mean_abs=0.3155028223991394, max_abs=1.375, mean_rel=0.2572644352912903, max_rel=57.73353958129883, norm_rel=0.021714411675930023, ref_abs_avg=14.72344970703125, test_abs_avg=14.711944580078125
production_forward grad[66] vs paper_forward: mean_abs=0.3976879119873047, max_abs=2.8125, mean_rel=0.14256420731544495, max_rel=760.8618774414062, norm_rel=0.022763192653656006, ref_abs_avg=17.43454360961914, test_abs_avg=17.435504913330078
production_forward grad[67] vs paper_forward: mean_abs=0.3923249840736389, max_abs=2.625, mean_rel=0.146078959107399, max_rel=490.7203674316406, norm_rel=0.022950705140829086, ref_abs_avg=17.09231185913086, test_abs_avg=17.097017288208008
production_forward grad[68] vs paper_forward: mean_abs=0.3117523193359375, max_abs=1.34375, mean_rel=0.0646403580904007, max_rel=2.719477891921997, norm_rel=0.02145530842244625, ref_abs_avg=14.79220962524414, test_abs_avg=14.804542541503906
production_forward grad[69] vs paper_forward: mean_abs=0.37765172123908997, max_abs=3.25, mean_rel=0.14151990413665771, max_rel=738.396240234375, norm_rel=0.02238314412534237, ref_abs_avg=16.836881637573242, test_abs_avg=16.838029861450195
production_forward grad[70] vs paper_forward: mean_abs=0.3735164999961853, max_abs=2.921875, mean_rel=0.14285007119178772, max_rel=919.497802734375, norm_rel=0.022488757967948914, ref_abs_avg=16.57465362548828, test_abs_avg=16.572601318359375
production_forward grad[71] vs paper_forward: mean_abs=0.30178356170654297, max_abs=1.234375, mean_rel=0.06012345850467682, max_rel=1.3081039190292358, norm_rel=0.02213945984840393, ref_abs_avg=14.114002227783203, test_abs_avg=14.087145805358887
production_forward grad[72] vs paper_forward: mean_abs=0.36303529143333435, max_abs=3.5, mean_rel=0.14079201221466064, max_rel=496.30548095703125, norm_rel=0.022197671234607697, ref_abs_avg=16.301116943359375, test_abs_avg=16.3016357421875
production_forward grad[73] vs paper_forward: mean_abs=0.35585200786590576, max_abs=2.25, mean_rel=0.15075694024562836, max_rel=1174.083740234375, norm_rel=0.022097887471318245, ref_abs_avg=16.082740783691406, test_abs_avg=16.08285903930664
production_forward grad[74] vs paper_forward: mean_abs=0.3519625663757324, max_abs=1.4375, mean_rel=0.08767090737819672, max_rel=7.223723888397217, norm_rel=0.024168070405721664, ref_abs_avg=14.657404899597168, test_abs_avg=14.665517807006836
production_forward grad[75] vs paper_forward: mean_abs=0.4136236608028412, max_abs=3.0, mean_rel=0.15438488125801086, max_rel=1364.5667724609375, norm_rel=0.02326316200196743, ref_abs_avg=17.775854110717773, test_abs_avg=17.775346755981445
production_forward grad[76] vs paper_forward: mean_abs=0.39902037382125854, max_abs=3.25, mean_rel=0.15089578926563263, max_rel=593.75, norm_rel=0.022901488468050957, ref_abs_avg=17.414386749267578, test_abs_avg=17.411867141723633
production_forward grad[77] vs paper_forward: mean_abs=0.32369208335876465, max_abs=1.5, mean_rel=0.18795955181121826, max_rel=14.136146545410156, norm_rel=0.022374073043465614, ref_abs_avg=14.22783088684082, test_abs_avg=14.212738990783691
production_forward grad[78] vs paper_forward: mean_abs=0.38033896684646606, max_abs=3.0625, mean_rel=0.14923828840255737, max_rel=734.60107421875, norm_rel=0.02246604859828949, ref_abs_avg=16.897254943847656, test_abs_avg=16.897624969482422
production_forward grad[79] vs paper_forward: mean_abs=0.36766570806503296, max_abs=2.75, mean_rel=0.14577355980873108, max_rel=575.832275390625, norm_rel=0.021901516243815422, ref_abs_avg=16.746522903442383, test_abs_avg=16.743711471557617
production_forward grad[80] vs paper_forward: mean_abs=0.2827122211456299, max_abs=1.1875, mean_rel=0.11209268867969513, max_rel=22.481382369995117, norm_rel=0.020410211756825447, ref_abs_avg=14.415164947509766, test_abs_avg=14.38991641998291
production_forward grad[81] vs paper_forward: mean_abs=0.3526137173175812, max_abs=3.40625, mean_rel=0.14147672057151794, max_rel=491.4716491699219, norm_rel=0.021971257403492928, ref_abs_avg=16.03241729736328, test_abs_avg=16.032188415527344
production_forward grad[82] vs paper_forward: mean_abs=0.34584006667137146, max_abs=2.796875, mean_rel=0.13668635487556458, max_rel=853.4052124023438, norm_rel=0.021766167134046555, ref_abs_avg=15.862528800964355, test_abs_avg=15.860763549804688
production_forward grad[83] vs paper_forward: mean_abs=0.2698936462402344, max_abs=1.125, mean_rel=0.06257502734661102, max_rel=3.92126202583313, norm_rel=0.020876498892903328, ref_abs_avg=12.945232391357422, test_abs_avg=12.935470581054688
production_forward grad[84] vs paper_forward: mean_abs=0.3359127640724182, max_abs=3.25, mean_rel=0.1421007513999939, max_rel=625.0783081054688, norm_rel=0.021571902558207512, ref_abs_avg=15.5449857711792, test_abs_avg=15.545722007751465
production_forward grad[85] vs paper_forward: mean_abs=0.323611319065094, max_abs=2.75, mean_rel=0.13930533826351166, max_rel=521.86279296875, norm_rel=0.021175354719161987, ref_abs_avg=15.213139533996582, test_abs_avg=15.210372924804688
production_forward grad[86] vs paper_forward: mean_abs=0.2709932327270508, max_abs=1.0625, mean_rel=0.06100497394800186, max_rel=1.5026575326919556, norm_rel=0.02040913701057434, ref_abs_avg=13.916550636291504, test_abs_avg=13.92930793762207
production_forward grad[87] vs paper_forward: mean_abs=0.3147282898426056, max_abs=3.484375, mean_rel=0.1311986744403839, max_rel=596.166259765625, norm_rel=0.02096789889037609, ref_abs_avg=15.01519775390625, test_abs_avg=15.014915466308594
production_forward grad[88] vs paper_forward: mean_abs=0.30102601647377014, max_abs=3.125, mean_rel=0.12521016597747803, max_rel=470.70367431640625, norm_rel=0.020734721794724464, ref_abs_avg=14.65792465209961, test_abs_avg=14.652998924255371
production_forward grad[89] vs paper_forward: mean_abs=0.24280469119548798, max_abs=1.125, mean_rel=0.09880180656909943, max_rel=11.481964111328125, norm_rel=0.021589815616607666, ref_abs_avg=11.797924041748047, test_abs_avg=11.786266326904297
production_forward grad[90] vs paper_forward: mean_abs=0.2886236310005188, max_abs=2.75, mean_rel=0.12564295530319214, max_rel=358.91748046875, norm_rel=0.020450666546821594, ref_abs_avg=14.210868835449219, test_abs_avg=14.210663795471191
production_forward grad[91] vs paper_forward: mean_abs=0.2813660204410553, max_abs=2.5, mean_rel=0.13393887877464294, max_rel=375.44195556640625, norm_rel=0.020323120057582855, ref_abs_avg=13.963833808898926, test_abs_avg=13.960187911987305
production_forward grad[92] vs paper_forward: mean_abs=0.21141886711120605, max_abs=0.9453125, mean_rel=0.07653658837080002, max_rel=11.17034912109375, norm_rel=0.0185979176312685, ref_abs_avg=11.92471981048584, test_abs_avg=11.907630920410156
production_forward grad[93] vs paper_forward: mean_abs=0.2770153284072876, max_abs=3.75, mean_rel=0.12316267192363739, max_rel=631.0134887695312, norm_rel=0.01976471021771431, ref_abs_avg=14.157875061035156, test_abs_avg=14.157171249389648
production_forward grad[94] vs paper_forward: mean_abs=0.2701234519481659, max_abs=3.0, mean_rel=0.12531203031539917, max_rel=510.8953552246094, norm_rel=0.019618995487689972, ref_abs_avg=13.921870231628418, test_abs_avg=13.914844512939453
production_forward grad[95] vs paper_forward: mean_abs=0.21480560302734375, max_abs=0.8125, mean_rel=0.06385080516338348, max_rel=3.1695504188537598, norm_rel=0.020076025277376175, ref_abs_avg=11.001059532165527, test_abs_avg=10.978800773620605
production_forward grad[96] vs paper_forward: mean_abs=0.25604021549224854, max_abs=3.0, mean_rel=0.11955840140581131, max_rel=703.134765625, norm_rel=0.019403906539082527, ref_abs_avg=13.39605712890625, test_abs_avg=13.396307945251465
production_forward grad[97] vs paper_forward: mean_abs=0.2527437210083008, max_abs=2.6875, mean_rel=0.11903245747089386, max_rel=1039.0853271484375, norm_rel=0.019310681149363518, ref_abs_avg=13.405138969421387, test_abs_avg=13.407012939453125
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001698662992566824, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008984197862446308, max_abs=0.4375, mean_rel=0.07552047073841095, max_rel=107.6690902709961, norm_rel=0.020683208480477333, ref_abs_avg=0.47036439180374146, test_abs_avg=0.470369815826416
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.497748374938965, max_abs=48.0, mean_rel=0.21670718491077423, max_rel=872.8619995117188, norm_rel=0.02119326964020729, ref_abs_avg=231.82565307617188, test_abs_avg=231.92111206054688
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.8935200572013855, max_abs=3.875, mean_rel=0.08732300996780396, max_rel=4.776328086853027, norm_rel=0.02294299006462097, ref_abs_avg=38.67307662963867, test_abs_avg=38.56929397583008
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.1189467906951904, max_abs=6.5, mean_rel=0.16456331312656403, max_rel=1930.7764892578125, norm_rel=0.02397145889699459, ref_abs_avg=46.894493103027344, test_abs_avg=46.897239685058594
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.100398302078247, max_abs=7.0, mean_rel=0.18497473001480103, max_rel=1606.0250244140625, norm_rel=0.02375100925564766, ref_abs_avg=46.56688690185547, test_abs_avg=46.57354736328125
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.8513617515563965, max_abs=3.75, mean_rel=0.08173082023859024, max_rel=4.220346450805664, norm_rel=0.024304835125803947, ref_abs_avg=36.11885070800781, test_abs_avg=36.0550651550293
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.004289984703064, max_abs=6.5, mean_rel=0.17491373419761658, max_rel=1820.7010498046875, norm_rel=0.023732444271445274, ref_abs_avg=42.49825668334961, test_abs_avg=42.50082778930664
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9770103693008423, max_abs=6.609375, mean_rel=0.14654624462127686, max_rel=867.7574462890625, norm_rel=0.023446233943104744, ref_abs_avg=41.88818359375, test_abs_avg=41.89282989501953
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7395248413085938, max_abs=3.375, mean_rel=0.1028369814157486, max_rel=5.550490856170654, norm_rel=0.024023491889238358, ref_abs_avg=31.03903579711914, test_abs_avg=31.08159637451172
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.909281849861145, max_abs=5.875, mean_rel=0.16866861283779144, max_rel=1236.6019287109375, norm_rel=0.023595476523041725, ref_abs_avg=38.770042419433594, test_abs_avg=38.77332305908203
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8836703300476074, max_abs=5.25, mean_rel=0.15728497505187988, max_rel=1406.013671875, norm_rel=0.02317207306623459, ref_abs_avg=38.3868408203125, test_abs_avg=38.38835144042969
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6541571617126465, max_abs=2.5, mean_rel=0.07351614534854889, max_rel=2.4171271324157715, norm_rel=0.02207275852560997, ref_abs_avg=30.499454498291016, test_abs_avg=30.4981632232666
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8402148485183716, max_abs=5.125, mean_rel=0.15450197458267212, max_rel=1553.8482666015625, norm_rel=0.023257868364453316, ref_abs_avg=36.29523849487305, test_abs_avg=36.29827117919922
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.8265733122825623, max_abs=5.0, mean_rel=0.16886645555496216, max_rel=1837.596923828125, norm_rel=0.023275457322597504, ref_abs_avg=35.69153594970703, test_abs_avg=35.694419860839844
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6631345748901367, max_abs=2.6875, mean_rel=0.10993161797523499, max_rel=6.4577836990356445, norm_rel=0.023639168590307236, ref_abs_avg=28.313800811767578, test_abs_avg=28.32927131652832
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7790449261665344, max_abs=5.140625, mean_rel=0.1628216803073883, max_rel=994.5846557617188, norm_rel=0.023148836567997932, ref_abs_avg=33.81970977783203, test_abs_avg=33.81950378417969
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7623844146728516, max_abs=5.0, mean_rel=0.14535005390644073, max_rel=786.08984375, norm_rel=0.02277175523340702, ref_abs_avg=33.6298828125, test_abs_avg=33.632713317871094
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.6122614145278931, max_abs=2.25, mean_rel=0.10203389823436737, max_rel=5.127725601196289, norm_rel=0.025206226855516434, ref_abs_avg=24.561782836914062, test_abs_avg=24.590248107910156
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.7343872785568237, max_abs=5.5, mean_rel=0.16218966245651245, max_rel=884.8005981445312, norm_rel=0.022880354896187782, ref_abs_avg=32.225006103515625, test_abs_avg=32.22542190551758
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.719429612159729, max_abs=4.25, mean_rel=0.14240428805351257, max_rel=784.0853881835938, norm_rel=0.022771716117858887, ref_abs_avg=31.73087501525879, test_abs_avg=31.732494354248047
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5574468374252319, max_abs=2.0, mean_rel=0.26747503876686096, max_rel=44.10806655883789, norm_rel=0.021713685244321823, ref_abs_avg=25.682056427001953, test_abs_avg=25.677162170410156
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6984929442405701, max_abs=4.3125, mean_rel=0.15316003561019897, max_rel=1269.9461669921875, norm_rel=0.02288307435810566, ref_abs_avg=30.71664047241211, test_abs_avg=30.71772003173828
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6812105178833008, max_abs=4.09375, mean_rel=0.16359969973564148, max_rel=1405.6171875, norm_rel=0.02270743064582348, ref_abs_avg=30.203857421875, test_abs_avg=30.20247459411621
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.5732449293136597, max_abs=2.3125, mean_rel=0.3923671841621399, max_rel=135.1384735107422, norm_rel=0.023352008312940598, ref_abs_avg=24.451862335205078, test_abs_avg=24.407455444335938
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6614956855773926, max_abs=4.1875, mean_rel=0.148538738489151, max_rel=910.413330078125, norm_rel=0.022694181650877, ref_abs_avg=29.267709732055664, test_abs_avg=29.266441345214844
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6467767357826233, max_abs=3.96875, mean_rel=0.15871496498584747, max_rel=1020.39892578125, norm_rel=0.022435765713453293, ref_abs_avg=28.952396392822266, test_abs_avg=28.951581954956055
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6523295640945435, max_abs=2.5, mean_rel=0.06766434013843536, max_rel=1.0094808340072632, norm_rel=0.02445492148399353, ref_abs_avg=26.486351013183594, test_abs_avg=26.467247009277344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7642418146133423, max_abs=5.75, mean_rel=0.15517736971378326, max_rel=913.8799438476562, norm_rel=0.02440100722014904, ref_abs_avg=31.42788314819336, test_abs_avg=31.425615310668945
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7480630874633789, max_abs=5.0, mean_rel=0.15507274866104126, max_rel=868.5079345703125, norm_rel=0.024272697046399117, ref_abs_avg=30.89696502685547, test_abs_avg=30.901947021484375
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.6154255867004395, max_abs=2.0, mean_rel=0.07382211834192276, max_rel=2.473891258239746, norm_rel=0.02683810517191887, ref_abs_avg=23.304880142211914, test_abs_avg=23.267024993896484
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.7073200941085815, max_abs=4.9375, mean_rel=0.16305656731128693, max_rel=1254.6746826171875, norm_rel=0.024730026721954346, ref_abs_avg=28.68146514892578, test_abs_avg=28.68102264404297
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.698795735836029, max_abs=4.5625, mean_rel=0.16518950462341309, max_rel=818.6468505859375, norm_rel=0.024390799924731255, ref_abs_avg=28.736083984375, test_abs_avg=28.736560821533203
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5215897560119629, max_abs=1.875, mean_rel=0.0768081396818161, max_rel=4.354163646697998, norm_rel=0.022509047761559486, ref_abs_avg=23.309696197509766, test_abs_avg=23.285594940185547
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6618824005126953, max_abs=4.4296875, mean_rel=0.16238999366760254, max_rel=793.4574584960938, norm_rel=0.024466726928949356, ref_abs_avg=27.10666275024414, test_abs_avg=27.106586456298828
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6508908867835999, max_abs=4.1875, mean_rel=0.1710340827703476, max_rel=1160.0645751953125, norm_rel=0.024555085226893425, ref_abs_avg=26.581832885742188, test_abs_avg=26.586301803588867
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.501272439956665, max_abs=1.875, mean_rel=0.10297366976737976, max_rel=4.104186534881592, norm_rel=0.022960949689149857, ref_abs_avg=21.716995239257812, test_abs_avg=21.736209869384766
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.6129146814346313, max_abs=4.203125, mean_rel=0.16088774800300598, max_rel=718.9949340820312, norm_rel=0.024301815778017044, ref_abs_avg=25.262340545654297, test_abs_avg=25.260902404785156
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.6071246862411499, max_abs=3.8359375, mean_rel=0.1638624370098114, max_rel=720.2942504882812, norm_rel=0.024345653131604195, ref_abs_avg=25.026275634765625, test_abs_avg=25.024940490722656
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.49390125274658203, max_abs=2.0, mean_rel=0.08730651438236237, max_rel=2.1048543453216553, norm_rel=0.02542456053197384, ref_abs_avg=19.677589416503906, test_abs_avg=19.628488540649414
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5828673243522644, max_abs=3.875, mean_rel=0.15228165686130524, max_rel=714.3042602539062, norm_rel=0.023930763825774193, ref_abs_avg=24.436561584472656, test_abs_avg=24.43717384338379
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5715317726135254, max_abs=3.9375, mean_rel=0.15271836519241333, max_rel=543.848388671875, norm_rel=0.023846419528126717, ref_abs_avg=24.005916595458984, test_abs_avg=24.007221221923828
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4475705623626709, max_abs=2.0, mean_rel=0.1015976220369339, max_rel=14.574508666992188, norm_rel=0.023508824408054352, ref_abs_avg=19.99234390258789, test_abs_avg=19.996091842651367
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5565416216850281, max_abs=3.5625, mean_rel=0.15463143587112427, max_rel=977.710693359375, norm_rel=0.02373744733631611, ref_abs_avg=23.465953826904297, test_abs_avg=23.464670181274414
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5461940765380859, max_abs=3.25, mean_rel=0.15128456056118011, max_rel=594.2208251953125, norm_rel=0.023544562980532646, ref_abs_avg=23.213542938232422, test_abs_avg=23.218172073364258
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.42238950729370117, max_abs=2.078125, mean_rel=0.17630241811275482, max_rel=45.29564666748047, norm_rel=0.022868221625685692, ref_abs_avg=18.385509490966797, test_abs_avg=18.352270126342773
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5315482020378113, max_abs=3.5, mean_rel=0.15188844501972198, max_rel=535.0248413085938, norm_rel=0.023500952869653702, ref_abs_avg=22.604339599609375, test_abs_avg=22.602344512939453
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.5172383785247803, max_abs=3.125, mean_rel=0.15548747777938843, max_rel=959.1500244140625, norm_rel=0.02342003397643566, ref_abs_avg=22.086528778076172, test_abs_avg=22.082284927368164
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.4288877248764038, max_abs=1.4375, mean_rel=0.08382411301136017, max_rel=8.59991455078125, norm_rel=0.02316453494131565, ref_abs_avg=18.29098892211914, test_abs_avg=18.298206329345703
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.5073084831237793, max_abs=3.125, mean_rel=0.15375369787216187, max_rel=869.5202026367188, norm_rel=0.023430699482560158, ref_abs_avg=21.66227912902832, test_abs_avg=21.663148880004883
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.494107723236084, max_abs=3.15234375, mean_rel=0.15027981996536255, max_rel=590.3019409179688, norm_rel=0.02319084294140339, ref_abs_avg=21.339550018310547, test_abs_avg=21.33938217163086
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4520556330680847, max_abs=1.625, mean_rel=0.1724415123462677, max_rel=42.262168884277344, norm_rel=0.0251067616045475, ref_abs_avg=18.659746170043945, test_abs_avg=18.636276245117188
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5544112324714661, max_abs=3.5, mean_rel=0.17116424441337585, max_rel=832.730224609375, norm_rel=0.025282490998506546, ref_abs_avg=21.946121215820312, test_abs_avg=21.947481155395508
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5487977266311646, max_abs=3.5, mean_rel=0.1611495465040207, max_rel=413.4899597167969, norm_rel=0.025037704035639763, ref_abs_avg=21.969829559326172, test_abs_avg=21.970664978027344
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.4523909091949463, max_abs=1.75, mean_rel=0.30278313159942627, max_rel=50.31269073486328, norm_rel=0.026558632031083107, ref_abs_avg=16.77495765686035, test_abs_avg=16.763084411621094
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.5125316381454468, max_abs=3.75, mean_rel=0.15834033489227295, max_rel=1058.3271484375, norm_rel=0.024992739781737328, ref_abs_avg=20.52323341369629, test_abs_avg=20.524635314941406
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.507296085357666, max_abs=3.75, mean_rel=0.16059809923171997, max_rel=841.6669311523438, norm_rel=0.024621905758976936, ref_abs_avg=20.58697509765625, test_abs_avg=20.59416961669922
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.42090320587158203, max_abs=1.5, mean_rel=0.12401190400123596, max_rel=10.277247428894043, norm_rel=0.024543453007936478, ref_abs_avg=16.944459915161133, test_abs_avg=16.935829162597656
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.4796345829963684, max_abs=3.25, mean_rel=0.152388334274292, max_rel=573.8230590820312, norm_rel=0.02436278760433197, ref_abs_avg=19.699697494506836, test_abs_avg=19.700284957885742
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4724969267845154, max_abs=3.25, mean_rel=0.1561664640903473, max_rel=1028.1583251953125, norm_rel=0.024346811696887016, ref_abs_avg=19.443675994873047, test_abs_avg=19.443532943725586
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.3520388603210449, max_abs=1.4375, mean_rel=0.10148710012435913, max_rel=11.609453201293945, norm_rel=0.022965416312217712, ref_abs_avg=15.64433479309082, test_abs_avg=15.631359100341797
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.452728807926178, max_abs=3.25, mean_rel=0.1608959287405014, max_rel=1250.9278564453125, norm_rel=0.02408245950937271, ref_abs_avg=18.79003143310547, test_abs_avg=18.791593551635742
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4425000846385956, max_abs=3.0625, mean_rel=0.1527050882577896, max_rel=1059.2867431640625, norm_rel=0.023579027503728867, ref_abs_avg=18.802650451660156, test_abs_avg=18.80494499206543
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3451676368713379, max_abs=1.5, mean_rel=0.11187002062797546, max_rel=20.260526657104492, norm_rel=0.024290863424539566, ref_abs_avg=14.518030166625977, test_abs_avg=14.501015663146973
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.42562031745910645, max_abs=3.25, mean_rel=0.15601694583892822, max_rel=873.7329711914062, norm_rel=0.023706534877419472, ref_abs_avg=17.936813354492188, test_abs_avg=17.93865394592285
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.41736090183258057, max_abs=3.0625, mean_rel=0.1455913633108139, max_rel=402.8573913574219, norm_rel=0.02332099713385105, ref_abs_avg=17.876873016357422, test_abs_avg=17.87101936340332
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3076884150505066, max_abs=1.4375, mean_rel=0.22358018159866333, max_rel=62.94441604614258, norm_rel=0.02129117026925087, ref_abs_avg=14.72344970703125, test_abs_avg=14.714895248413086
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.40280401706695557, max_abs=2.75, mean_rel=0.145368754863739, max_rel=680.6400146484375, norm_rel=0.023056471720337868, ref_abs_avg=17.43454360961914, test_abs_avg=17.436132431030273
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3965867757797241, max_abs=2.625, mean_rel=0.1527785062789917, max_rel=659.467041015625, norm_rel=0.023195452988147736, ref_abs_avg=17.09231185913086, test_abs_avg=17.098434448242188
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.31795215606689453, max_abs=1.328125, mean_rel=0.07885075360536575, max_rel=5.106782913208008, norm_rel=0.0218580961227417, ref_abs_avg=14.79220962524414, test_abs_avg=14.810921669006348
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.38195499777793884, max_abs=3.0, mean_rel=0.14307630062103271, max_rel=726.4548950195312, norm_rel=0.022634640336036682, ref_abs_avg=16.836881637573242, test_abs_avg=16.838594436645508
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.37593790888786316, max_abs=2.953125, mean_rel=0.14504031836986542, max_rel=814.5824584960938, norm_rel=0.022615807130932808, ref_abs_avg=16.57465362548828, test_abs_avg=16.572998046875
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.32132840156555176, max_abs=1.25, mean_rel=0.06590460240840912, max_rel=1.4839290380477905, norm_rel=0.02269957773387432, ref_abs_avg=14.114002227783203, test_abs_avg=14.083855628967285
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3667241036891937, max_abs=3.75, mean_rel=0.14160418510437012, max_rel=503.473388671875, norm_rel=0.022412508726119995, ref_abs_avg=16.301116943359375, test_abs_avg=16.30146026611328
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3591098189353943, max_abs=2.5, mean_rel=0.14797943830490112, max_rel=1174.083740234375, norm_rel=0.02229931391775608, ref_abs_avg=16.082740783691406, test_abs_avg=16.08599853515625
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.3461642265319824, max_abs=1.25, mean_rel=0.08974193781614304, max_rel=4.9343791007995605, norm_rel=0.024039102718234062, ref_abs_avg=14.657404899597168, test_abs_avg=14.675483703613281
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.41912001371383667, max_abs=3.1875, mean_rel=0.15702924132347107, max_rel=1204.021728515625, norm_rel=0.023570818826556206, ref_abs_avg=17.775854110717773, test_abs_avg=17.7750244140625
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.4041266441345215, max_abs=3.0, mean_rel=0.15131530165672302, max_rel=475.1044006347656, norm_rel=0.023180732503533363, ref_abs_avg=17.414386749267578, test_abs_avg=17.409408569335938
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.33330798149108887, max_abs=1.59375, mean_rel=0.17945870757102966, max_rel=22.393962860107422, norm_rel=0.022984685376286507, ref_abs_avg=14.22783088684082, test_abs_avg=14.221134185791016
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3857139050960541, max_abs=3.5, mean_rel=0.1503671109676361, max_rel=756.7031860351562, norm_rel=0.022768674418330193, ref_abs_avg=16.897254943847656, test_abs_avg=16.89777183532715
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.3708206117153168, max_abs=3.125, mean_rel=0.14392481744289398, max_rel=606.8021850585938, norm_rel=0.022063883021473885, ref_abs_avg=16.746522903442383, test_abs_avg=16.7458553314209
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.285442054271698, max_abs=1.28125, mean_rel=0.09913790225982666, max_rel=18.036766052246094, norm_rel=0.020435918122529984, ref_abs_avg=14.415164947509766, test_abs_avg=14.388933181762695
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.35676926374435425, max_abs=3.25, mean_rel=0.14001914858818054, max_rel=574.3494873046875, norm_rel=0.022227881476283073, ref_abs_avg=16.03241729736328, test_abs_avg=16.032258987426758
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.3506331443786621, max_abs=2.5, mean_rel=0.1368781328201294, max_rel=679.8722534179688, norm_rel=0.02203124202787876, ref_abs_avg=15.862528800964355, test_abs_avg=15.861677169799805
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2584569454193115, max_abs=1.0546875, mean_rel=0.05722208321094513, max_rel=2.9381535053253174, norm_rel=0.02035326510667801, ref_abs_avg=12.945232391357422, test_abs_avg=12.936241149902344
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.33890730142593384, max_abs=3.109375, mean_rel=0.14213865995407104, max_rel=780.2097778320312, norm_rel=0.021774081513285637, ref_abs_avg=15.5449857711792, test_abs_avg=15.545890808105469
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.32599297165870667, max_abs=2.65625, mean_rel=0.14403459429740906, max_rel=545.714599609375, norm_rel=0.02136382646858692, ref_abs_avg=15.213139533996582, test_abs_avg=15.21245002746582
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.2640344500541687, max_abs=1.0, mean_rel=0.06312417984008789, max_rel=3.980179786682129, norm_rel=0.020096253603696823, ref_abs_avg=13.916550636291504, test_abs_avg=13.937458038330078
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.3173055648803711, max_abs=3.078125, mean_rel=0.13088271021842957, max_rel=713.6644897460938, norm_rel=0.02112022414803505, ref_abs_avg=15.01519775390625, test_abs_avg=15.015274047851562
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.30144762992858887, max_abs=3.125, mean_rel=0.12386257201433182, max_rel=454.0581359863281, norm_rel=0.020764678716659546, ref_abs_avg=14.65792465209961, test_abs_avg=14.653736114501953
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.24930989742279053, max_abs=1.205078125, mean_rel=0.12676216661930084, max_rel=12.783563613891602, norm_rel=0.021443411707878113, ref_abs_avg=11.797924041748047, test_abs_avg=11.789777755737305
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2902481257915497, max_abs=2.8125, mean_rel=0.1253574937582016, max_rel=370.19158935546875, norm_rel=0.020563462749123573, ref_abs_avg=14.210868835449219, test_abs_avg=14.210457801818848
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.2829449474811554, max_abs=2.5, mean_rel=0.1347726285457611, max_rel=425.21044921875, norm_rel=0.020415009930729866, ref_abs_avg=13.963833808898926, test_abs_avg=13.959397315979004
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.22398287057876587, max_abs=0.8671875, mean_rel=0.06323713064193726, max_rel=4.9307661056518555, norm_rel=0.01945917122066021, ref_abs_avg=11.92471981048584, test_abs_avg=11.908010482788086
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2783568799495697, max_abs=3.75, mean_rel=0.12588387727737427, max_rel=794.4617919921875, norm_rel=0.019846072420477867, ref_abs_avg=14.157875061035156, test_abs_avg=14.157512664794922
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.27189213037490845, max_abs=2.75, mean_rel=0.12639060616493225, max_rel=631.339111328125, norm_rel=0.019689545035362244, ref_abs_avg=13.921870231628418, test_abs_avg=13.913402557373047
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2213383913040161, max_abs=0.875, mean_rel=0.06085819751024246, max_rel=2.2571606636047363, norm_rel=0.020507460460066795, ref_abs_avg=11.001059532165527, test_abs_avg=10.986812591552734
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.2566051483154297, max_abs=3.375, mean_rel=0.12205764651298523, max_rel=1054.5968017578125, norm_rel=0.019449664279818535, ref_abs_avg=13.39605712890625, test_abs_avg=13.396585464477539
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.2504851222038269, max_abs=2.5, mean_rel=0.117601677775383, max_rel=823.232177734375, norm_rel=0.019166449084877968, ref_abs_avg=13.405138969421387, test_abs_avg=13.4069185256958
liger_forward vs paper_forward output: mean_abs=0.00015401412383653224, max_abs=0.03125
liger_forward grad[0] vs paper_forward: mean_abs=0.0036169225350022316, max_abs=0.25, mean_rel=0.02557450532913208, max_rel=67.15528106689453, norm_rel=0.009619316086173058, ref_abs_avg=0.47036439180374146, test_abs_avg=0.47034141421318054
liger_forward grad[1] vs paper_forward: mean_abs=1.6260391473770142, max_abs=16.0, mean_rel=0.08833763003349304, max_rel=476.28277587890625, norm_rel=0.006670147180557251, ref_abs_avg=231.82565307617188, test_abs_avg=231.85304260253906
liger_forward grad[2] vs paper_forward: mean_abs=0.31471407413482666, max_abs=1.25, mean_rel=0.03382425010204315, max_rel=2.7439703941345215, norm_rel=0.00845273956656456, ref_abs_avg=38.67307662963867, test_abs_avg=38.660560607910156
liger_forward grad[3] vs paper_forward: mean_abs=0.3944392800331116, max_abs=3.0, mean_rel=0.057790085673332214, max_rel=838.5614013671875, norm_rel=0.008726665750145912, ref_abs_avg=46.894493103027344, test_abs_avg=46.89597702026367
liger_forward grad[4] vs paper_forward: mean_abs=0.38064926862716675, max_abs=2.375, mean_rel=0.06282274425029755, max_rel=789.9514770507812, norm_rel=0.008511022664606571, ref_abs_avg=46.56688690185547, test_abs_avg=46.56848907470703
liger_forward grad[5] vs paper_forward: mean_abs=0.30057287216186523, max_abs=1.25, mean_rel=0.05502168834209442, max_rel=7.802137851715088, norm_rel=0.00887218862771988, ref_abs_avg=36.11885070800781, test_abs_avg=36.12195587158203
liger_forward grad[6] vs paper_forward: mean_abs=0.3501718044281006, max_abs=2.5, mean_rel=0.05961841344833374, max_rel=555.2107543945312, norm_rel=0.008561338298022747, ref_abs_avg=42.49825668334961, test_abs_avg=42.49894714355469
liger_forward grad[7] vs paper_forward: mean_abs=0.33353304862976074, max_abs=2.375, mean_rel=0.05134385824203491, max_rel=459.7507019042969, norm_rel=0.008302994072437286, ref_abs_avg=41.88818359375, test_abs_avg=41.88656997680664
liger_forward grad[8] vs paper_forward: mean_abs=0.23624801635742188, max_abs=0.8125, mean_rel=0.031981512904167175, max_rel=2.977903127670288, norm_rel=0.007973900064826012, ref_abs_avg=31.03903579711914, test_abs_avg=31.04364776611328
liger_forward grad[9] vs paper_forward: mean_abs=0.31349441409111023, max_abs=2.125, mean_rel=0.056049443781375885, max_rel=427.6131591796875, norm_rel=0.008422240614891052, ref_abs_avg=38.770042419433594, test_abs_avg=38.770599365234375
liger_forward grad[10] vs paper_forward: mean_abs=0.2997138500213623, max_abs=2.0, mean_rel=0.05636054649949074, max_rel=902.0155029296875, norm_rel=0.00816293340176344, ref_abs_avg=38.3868408203125, test_abs_avg=38.38700866699219
liger_forward grad[11] vs paper_forward: mean_abs=0.24114990234375, max_abs=1.0, mean_rel=0.026967883110046387, max_rel=1.62949800491333, norm_rel=0.00834453571587801, ref_abs_avg=30.499454498291016, test_abs_avg=30.47983169555664
liger_forward grad[12] vs paper_forward: mean_abs=0.285047709941864, max_abs=2.0, mean_rel=0.05502631142735481, max_rel=271.23077392578125, norm_rel=0.008190802298486233, ref_abs_avg=36.29523849487305, test_abs_avg=36.294898986816406
liger_forward grad[13] vs paper_forward: mean_abs=0.27671146392822266, max_abs=2.0, mean_rel=0.05288258567452431, max_rel=528.062255859375, norm_rel=0.00810313131660223, ref_abs_avg=35.69153594970703, test_abs_avg=35.693275451660156
liger_forward grad[14] vs paper_forward: mean_abs=0.21001803874969482, max_abs=0.796875, mean_rel=0.03303098678588867, max_rel=1.6514298915863037, norm_rel=0.007676709443330765, ref_abs_avg=28.313800811767578, test_abs_avg=28.300338745117188
liger_forward grad[15] vs paper_forward: mean_abs=0.2622745633125305, max_abs=2.0, mean_rel=0.056115638464689255, max_rel=446.686279296875, norm_rel=0.00810067169368267, ref_abs_avg=33.81970977783203, test_abs_avg=33.8203239440918
liger_forward grad[16] vs paper_forward: mean_abs=0.25359201431274414, max_abs=1.5, mean_rel=0.04739033430814743, max_rel=162.41766357421875, norm_rel=0.00789645966142416, ref_abs_avg=33.6298828125, test_abs_avg=33.628726959228516
liger_forward grad[17] vs paper_forward: mean_abs=0.1942281723022461, max_abs=0.875, mean_rel=0.035267993807792664, max_rel=3.719470262527466, norm_rel=0.008354034274816513, ref_abs_avg=24.561782836914062, test_abs_avg=24.563034057617188
liger_forward grad[18] vs paper_forward: mean_abs=0.24448099732398987, max_abs=1.5, mean_rel=0.055100761353969574, max_rel=538.5540771484375, norm_rel=0.007930979132652283, ref_abs_avg=32.225006103515625, test_abs_avg=32.22576141357422
liger_forward grad[19] vs paper_forward: mean_abs=0.23680414259433746, max_abs=1.5, mean_rel=0.04575345292687416, max_rel=122.65680694580078, norm_rel=0.007821301929652691, ref_abs_avg=31.73087501525879, test_abs_avg=31.731481552124023
liger_forward grad[20] vs paper_forward: mean_abs=0.18476974964141846, max_abs=0.75, mean_rel=0.09643124043941498, max_rel=31.23533058166504, norm_rel=0.007467383984476328, ref_abs_avg=25.682056427001953, test_abs_avg=25.673049926757812
liger_forward grad[21] vs paper_forward: mean_abs=0.23031681776046753, max_abs=1.5, mean_rel=0.05178450047969818, max_rel=286.4247741699219, norm_rel=0.007855556905269623, ref_abs_avg=30.71664047241211, test_abs_avg=30.71673583984375
liger_forward grad[22] vs paper_forward: mean_abs=0.22219140827655792, max_abs=1.25, mean_rel=0.05113932117819786, max_rel=328.4425048828125, norm_rel=0.007730087265372276, ref_abs_avg=30.203857421875, test_abs_avg=30.20318603515625
liger_forward grad[23] vs paper_forward: mean_abs=0.19218073785305023, max_abs=0.75, mean_rel=0.1972409039735794, max_rel=85.15727233886719, norm_rel=0.008060049265623093, ref_abs_avg=24.451862335205078, test_abs_avg=24.442729949951172
liger_forward grad[24] vs paper_forward: mean_abs=0.2176116406917572, max_abs=1.5, mean_rel=0.048225291073322296, max_rel=482.6015625, norm_rel=0.007786906324326992, ref_abs_avg=29.267709732055664, test_abs_avg=29.268054962158203
liger_forward grad[25] vs paper_forward: mean_abs=0.21038992702960968, max_abs=1.5, mean_rel=0.05069056153297424, max_rel=253.35702514648438, norm_rel=0.007631917484104633, ref_abs_avg=28.952396392822266, test_abs_avg=28.953989028930664
liger_forward grad[26] vs paper_forward: mean_abs=0.20341777801513672, max_abs=1.0, mean_rel=0.029010191559791565, max_rel=1.8523763418197632, norm_rel=0.007962768897414207, ref_abs_avg=26.486351013183594, test_abs_avg=26.491168975830078
liger_forward grad[27] vs paper_forward: mean_abs=0.23951828479766846, max_abs=1.5, mean_rel=0.05183542147278786, max_rel=485.3593444824219, norm_rel=0.007971096783876419, ref_abs_avg=31.42788314819336, test_abs_avg=31.428997039794922
liger_forward grad[28] vs paper_forward: mean_abs=0.23120594024658203, max_abs=1.75, mean_rel=0.048223674297332764, max_rel=246.47467041015625, norm_rel=0.00784690584987402, ref_abs_avg=30.89696502685547, test_abs_avg=30.89889144897461
liger_forward grad[29] vs paper_forward: mean_abs=0.17962932586669922, max_abs=0.875, mean_rel=0.0316351093351841, max_rel=3.2555441856384277, norm_rel=0.00821039080619812, ref_abs_avg=23.304880142211914, test_abs_avg=23.298946380615234
liger_forward grad[30] vs paper_forward: mean_abs=0.21440646052360535, max_abs=1.5, mean_rel=0.049349673092365265, max_rel=346.60516357421875, norm_rel=0.007825613021850586, ref_abs_avg=28.68146514892578, test_abs_avg=28.682151794433594
liger_forward grad[31] vs paper_forward: mean_abs=0.20743262767791748, max_abs=1.25, mean_rel=0.05033688619732857, max_rel=414.3248291015625, norm_rel=0.00758689921349287, ref_abs_avg=28.736083984375, test_abs_avg=28.735937118530273
liger_forward grad[32] vs paper_forward: mean_abs=0.1601705551147461, max_abs=0.75, mean_rel=0.03219398856163025, max_rel=3.8744962215423584, norm_rel=0.007317462936043739, ref_abs_avg=23.309696197509766, test_abs_avg=23.304723739624023
liger_forward grad[33] vs paper_forward: mean_abs=0.19649331271648407, max_abs=1.625, mean_rel=0.0485856868326664, max_rel=336.85345458984375, norm_rel=0.007601078599691391, ref_abs_avg=27.10666275024414, test_abs_avg=27.10675811767578
liger_forward grad[34] vs paper_forward: mean_abs=0.18822619318962097, max_abs=1.5, mean_rel=0.0487479493021965, max_rel=302.2459411621094, norm_rel=0.007455388084053993, ref_abs_avg=26.581832885742188, test_abs_avg=26.582286834716797
liger_forward grad[35] vs paper_forward: mean_abs=0.14262449741363525, max_abs=0.625, mean_rel=0.03082304075360298, max_rel=3.2991201877593994, norm_rel=0.006844797171652317, ref_abs_avg=21.716995239257812, test_abs_avg=21.729312896728516
liger_forward grad[36] vs paper_forward: mean_abs=0.1806803047657013, max_abs=1.25, mean_rel=0.048751167953014374, max_rel=309.2531433105469, norm_rel=0.007510003633797169, ref_abs_avg=25.262340545654297, test_abs_avg=25.261734008789062
liger_forward grad[37] vs paper_forward: mean_abs=0.17421524226665497, max_abs=1.0, mean_rel=0.04644038900732994, max_rel=242.0782012939453, norm_rel=0.007333835586905479, ref_abs_avg=25.026275634765625, test_abs_avg=25.027381896972656
liger_forward grad[38] vs paper_forward: mean_abs=0.1437230110168457, max_abs=0.5, mean_rel=0.023602422326803207, max_rel=1.1332069635391235, norm_rel=0.007650646381080151, ref_abs_avg=19.677589416503906, test_abs_avg=19.683574676513672
liger_forward grad[39] vs paper_forward: mean_abs=0.1702783852815628, max_abs=1.25, mean_rel=0.04503089189529419, max_rel=465.4892272949219, norm_rel=0.007331641390919685, ref_abs_avg=24.436561584472656, test_abs_avg=24.437328338623047
liger_forward grad[40] vs paper_forward: mean_abs=0.16313761472702026, max_abs=1.0, mean_rel=0.04297974705696106, max_rel=113.04830169677734, norm_rel=0.007163756061345339, ref_abs_avg=24.005916595458984, test_abs_avg=24.00577735900879
liger_forward grad[41] vs paper_forward: mean_abs=0.1240999698638916, max_abs=0.5, mean_rel=0.03012596257030964, max_rel=2.997706890106201, norm_rel=0.006781459785997868, ref_abs_avg=19.99234390258789, test_abs_avg=19.98204803466797
liger_forward grad[42] vs paper_forward: mean_abs=0.16008228063583374, max_abs=1.125, mean_rel=0.04543129727244377, max_rel=319.8178405761719, norm_rel=0.007187215611338615, ref_abs_avg=23.465953826904297, test_abs_avg=23.46603012084961
liger_forward grad[43] vs paper_forward: mean_abs=0.1546672284603119, max_abs=1.0546875, mean_rel=0.04586777463555336, max_rel=268.5898132324219, norm_rel=0.007041930686682463, ref_abs_avg=23.213542938232422, test_abs_avg=23.21442413330078
liger_forward grad[44] vs paper_forward: mean_abs=0.12332868576049805, max_abs=0.625, mean_rel=0.04725068062543869, max_rel=13.034358978271484, norm_rel=0.0070999301970005035, ref_abs_avg=18.385509490966797, test_abs_avg=18.383825302124023
liger_forward grad[45] vs paper_forward: mean_abs=0.15204308927059174, max_abs=1.0, mean_rel=0.04416566342115402, max_rel=213.30455017089844, norm_rel=0.007086577825248241, ref_abs_avg=22.604339599609375, test_abs_avg=22.604520797729492
liger_forward grad[46] vs paper_forward: mean_abs=0.14696262776851654, max_abs=1.0, mean_rel=0.04470382630825043, max_rel=211.74517822265625, norm_rel=0.0070362272672355175, ref_abs_avg=22.086528778076172, test_abs_avg=22.084205627441406
liger_forward grad[47] vs paper_forward: mean_abs=0.12441909313201904, max_abs=0.5, mean_rel=0.020449649542570114, max_rel=0.7936218976974487, norm_rel=0.007255712058395147, ref_abs_avg=18.29098892211914, test_abs_avg=18.286863327026367
liger_forward grad[48] vs paper_forward: mean_abs=0.14429223537445068, max_abs=1.0, mean_rel=0.043722473084926605, max_rel=235.80421447753906, norm_rel=0.0070341783575713634, ref_abs_avg=21.66227912902832, test_abs_avg=21.662919998168945
liger_forward grad[49] vs paper_forward: mean_abs=0.13777965307235718, max_abs=1.0, mean_rel=0.04438185319304466, max_rel=248.09568786621094, norm_rel=0.006836903281509876, ref_abs_avg=21.339550018310547, test_abs_avg=21.338895797729492
liger_forward grad[50] vs paper_forward: mean_abs=0.14027154445648193, max_abs=0.625, mean_rel=0.061773642897605896, max_rel=17.63382911682129, norm_rel=0.007986598648130894, ref_abs_avg=18.659746170043945, test_abs_avg=18.666786193847656
liger_forward grad[51] vs paper_forward: mean_abs=0.15961217880249023, max_abs=1.0, mean_rel=0.04796139523386955, max_rel=217.07005310058594, norm_rel=0.007613623980432749, ref_abs_avg=21.946121215820312, test_abs_avg=21.946115493774414
liger_forward grad[52] vs paper_forward: mean_abs=0.15605506300926208, max_abs=1.0, mean_rel=0.04420152306556702, max_rel=155.3392791748047, norm_rel=0.00746264960616827, ref_abs_avg=21.969829559326172, test_abs_avg=21.969768524169922
liger_forward grad[53] vs paper_forward: mean_abs=0.11877799034118652, max_abs=0.5, mean_rel=0.057424090802669525, max_rel=5.751465797424316, norm_rel=0.007341928780078888, ref_abs_avg=16.77495765686035, test_abs_avg=16.774513244628906
liger_forward grad[54] vs paper_forward: mean_abs=0.14537356793880463, max_abs=1.25, mean_rel=0.04481850564479828, max_rel=419.5242614746094, norm_rel=0.007439021021127701, ref_abs_avg=20.52323341369629, test_abs_avg=20.522994995117188
liger_forward grad[55] vs paper_forward: mean_abs=0.14189323782920837, max_abs=1.0, mean_rel=0.0438433401286602, max_rel=158.8560791015625, norm_rel=0.007259387988597155, ref_abs_avg=20.58697509765625, test_abs_avg=20.58637809753418
liger_forward grad[56] vs paper_forward: mean_abs=0.10531806945800781, max_abs=0.5, mean_rel=0.03174520656466484, max_rel=1.9547569751739502, norm_rel=0.006843224633485079, ref_abs_avg=16.944459915161133, test_abs_avg=16.936668395996094
liger_forward grad[57] vs paper_forward: mean_abs=0.13575959205627441, max_abs=1.0, mean_rel=0.043139100074768066, max_rel=145.6553192138672, norm_rel=0.007249230984598398, ref_abs_avg=19.699697494506836, test_abs_avg=19.699501037597656
liger_forward grad[58] vs paper_forward: mean_abs=0.13159818947315216, max_abs=1.0, mean_rel=0.04316411912441254, max_rel=264.1361999511719, norm_rel=0.007148257922381163, ref_abs_avg=19.443675994873047, test_abs_avg=19.444866180419922
liger_forward grad[59] vs paper_forward: mean_abs=0.10206222534179688, max_abs=0.46875, mean_rel=0.031931716948747635, max_rel=2.557018756866455, norm_rel=0.007065469864755869, ref_abs_avg=15.64433479309082, test_abs_avg=15.637405395507812
liger_forward grad[60] vs paper_forward: mean_abs=0.12640902400016785, max_abs=1.3125, mean_rel=0.04366559535264969, max_rel=314.65399169921875, norm_rel=0.00708867609500885, ref_abs_avg=18.79003143310547, test_abs_avg=18.789840698242188
liger_forward grad[61] vs paper_forward: mean_abs=0.12320132553577423, max_abs=1.0, mean_rel=0.04177437350153923, max_rel=244.3397979736328, norm_rel=0.006941194646060467, ref_abs_avg=18.802650451660156, test_abs_avg=18.802915573120117
liger_forward grad[62] vs paper_forward: mean_abs=0.09317827224731445, max_abs=0.40625, mean_rel=0.026948658749461174, max_rel=2.9439151287078857, norm_rel=0.00704472791403532, ref_abs_avg=14.518030166625977, test_abs_avg=14.517672538757324
liger_forward grad[63] vs paper_forward: mean_abs=0.1197490394115448, max_abs=1.0, mean_rel=0.0431157648563385, max_rel=280.30694580078125, norm_rel=0.007030150853097439, ref_abs_avg=17.936813354492188, test_abs_avg=17.936975479125977
liger_forward grad[64] vs paper_forward: mean_abs=0.11545953154563904, max_abs=1.0, mean_rel=0.04128788411617279, max_rel=288.6771545410156, norm_rel=0.006834044121205807, ref_abs_avg=17.876873016357422, test_abs_avg=17.876224517822266
liger_forward grad[65] vs paper_forward: mean_abs=0.08972340822219849, max_abs=0.328125, mean_rel=0.10298453271389008, max_rel=23.734403610229492, norm_rel=0.0065343077294528484, ref_abs_avg=14.72344970703125, test_abs_avg=14.717037200927734
liger_forward grad[66] vs paper_forward: mean_abs=0.11174750328063965, max_abs=1.0, mean_rel=0.04099828749895096, max_rel=138.14601135253906, norm_rel=0.006776387803256512, ref_abs_avg=17.43454360961914, test_abs_avg=17.43484115600586
liger_forward grad[67] vs paper_forward: mean_abs=0.10801098495721817, max_abs=0.75, mean_rel=0.0426347553730011, max_rel=160.73272705078125, norm_rel=0.006706200540065765, ref_abs_avg=17.09231185913086, test_abs_avg=17.09270477294922
liger_forward grad[68] vs paper_forward: mean_abs=0.08807945251464844, max_abs=0.375, mean_rel=0.018925409764051437, max_rel=0.9024685025215149, norm_rel=0.006469787564128637, ref_abs_avg=14.79220962524414, test_abs_avg=14.791604995727539
liger_forward grad[69] vs paper_forward: mean_abs=0.10581070184707642, max_abs=1.0, mean_rel=0.03899019956588745, max_rel=158.99661254882812, norm_rel=0.006659054197371006, ref_abs_avg=16.836881637573242, test_abs_avg=16.83687400817871
liger_forward grad[70] vs paper_forward: mean_abs=0.10287930816411972, max_abs=1.0, mean_rel=0.03978808969259262, max_rel=253.30776977539062, norm_rel=0.0065884068608284, ref_abs_avg=16.57465362548828, test_abs_avg=16.576868057250977
liger_forward grad[71] vs paper_forward: mean_abs=0.09056568145751953, max_abs=0.375, mean_rel=0.020904723554849625, max_rel=0.6337724924087524, norm_rel=0.006758592557162046, ref_abs_avg=14.114002227783203, test_abs_avg=14.118941307067871
liger_forward grad[72] vs paper_forward: mean_abs=0.10110561549663544, max_abs=1.0, mean_rel=0.03919125348329544, max_rel=206.17669677734375, norm_rel=0.0065712472423911095, ref_abs_avg=16.301116943359375, test_abs_avg=16.30148696899414
liger_forward grad[73] vs paper_forward: mean_abs=0.09753182530403137, max_abs=0.6435546875, mean_rel=0.03901170566678047, max_rel=120.9515609741211, norm_rel=0.006453723646700382, ref_abs_avg=16.082740783691406, test_abs_avg=16.082902908325195
liger_forward grad[74] vs paper_forward: mean_abs=0.10385513305664062, max_abs=0.421875, mean_rel=0.022868122905492783, max_rel=2.0891008377075195, norm_rel=0.007446951698511839, ref_abs_avg=14.657404899597168, test_abs_avg=14.665999412536621
liger_forward grad[75] vs paper_forward: mean_abs=0.11988751590251923, max_abs=1.0, mean_rel=0.04537225142121315, max_rel=374.1801452636719, norm_rel=0.007106567732989788, ref_abs_avg=17.775854110717773, test_abs_avg=17.776329040527344
liger_forward grad[76] vs paper_forward: mean_abs=0.11545109003782272, max_abs=1.0, mean_rel=0.044165268540382385, max_rel=198.24217224121094, norm_rel=0.007009999360889196, ref_abs_avg=17.414386749267578, test_abs_avg=17.413524627685547
liger_forward grad[77] vs paper_forward: mean_abs=0.0961577296257019, max_abs=0.375, mean_rel=0.033510975539684296, max_rel=2.8014845848083496, norm_rel=0.006955437827855349, ref_abs_avg=14.22783088684082, test_abs_avg=14.217081069946289
liger_forward grad[78] vs paper_forward: mean_abs=0.10906567424535751, max_abs=1.0, mean_rel=0.04222261160612106, max_rel=192.93295288085938, norm_rel=0.00681701023131609, ref_abs_avg=16.897254943847656, test_abs_avg=16.8978214263916
liger_forward grad[79] vs paper_forward: mean_abs=0.10584072023630142, max_abs=1.0, mean_rel=0.040903590619564056, max_rel=124.87708282470703, norm_rel=0.006698866840451956, ref_abs_avg=16.746522903442383, test_abs_avg=16.74597930908203
liger_forward grad[80] vs paper_forward: mean_abs=0.08190083503723145, max_abs=0.5, mean_rel=0.021766509860754013, max_rel=2.8997230529785156, norm_rel=0.00626655388623476, ref_abs_avg=14.415164947509766, test_abs_avg=14.410089492797852
liger_forward grad[81] vs paper_forward: mean_abs=0.10129179060459137, max_abs=1.0, mean_rel=0.03815842047333717, max_rel=198.2017364501953, norm_rel=0.006684464402496815, ref_abs_avg=16.03241729736328, test_abs_avg=16.03289794921875
liger_forward grad[82] vs paper_forward: mean_abs=0.09806053340435028, max_abs=1.0, mean_rel=0.036923665553331375, max_rel=159.56288146972656, norm_rel=0.006564279552549124, ref_abs_avg=15.862528800964355, test_abs_avg=15.863981246948242
liger_forward grad[83] vs paper_forward: mean_abs=0.077392578125, max_abs=0.3125, mean_rel=0.015433434396982193, max_rel=0.3690003454685211, norm_rel=0.006502208299934864, ref_abs_avg=12.945232391357422, test_abs_avg=12.939250946044922
liger_forward grad[84] vs paper_forward: mean_abs=0.09533868730068207, max_abs=1.0, mean_rel=0.03977244347333908, max_rel=194.02293395996094, norm_rel=0.006517540197819471, ref_abs_avg=15.5449857711792, test_abs_avg=15.544949531555176
liger_forward grad[85] vs paper_forward: mean_abs=0.09183689206838608, max_abs=1.0, mean_rel=0.0400729775428772, max_rel=151.09603881835938, norm_rel=0.006421826314181089, ref_abs_avg=15.213139533996582, test_abs_avg=15.211077690124512
liger_forward grad[86] vs paper_forward: mean_abs=0.07539129257202148, max_abs=0.3125, mean_rel=0.021908370777964592, max_rel=2.08437442779541, norm_rel=0.005950355902314186, ref_abs_avg=13.916550636291504, test_abs_avg=13.918661117553711
liger_forward grad[87] vs paper_forward: mean_abs=0.09009253978729248, max_abs=1.0, mean_rel=0.037629567086696625, max_rel=110.6024169921875, norm_rel=0.006395568605512381, ref_abs_avg=15.01519775390625, test_abs_avg=15.015225410461426
liger_forward grad[88] vs paper_forward: mean_abs=0.08498860895633698, max_abs=1.0, mean_rel=0.03596065938472748, max_rel=147.6570281982422, norm_rel=0.006267060525715351, ref_abs_avg=14.65792465209961, test_abs_avg=14.658231735229492
liger_forward grad[89] vs paper_forward: mean_abs=0.07398843765258789, max_abs=0.28125, mean_rel=0.03360265493392944, max_rel=3.143436908721924, norm_rel=0.006645746994763613, ref_abs_avg=11.797924041748047, test_abs_avg=11.801239013671875
liger_forward grad[90] vs paper_forward: mean_abs=0.08301316201686859, max_abs=1.0, mean_rel=0.03552623093128204, max_rel=171.39175415039062, norm_rel=0.006283837836235762, ref_abs_avg=14.210868835449219, test_abs_avg=14.210783004760742
liger_forward grad[91] vs paper_forward: mean_abs=0.08074277639389038, max_abs=0.75, mean_rel=0.038375332951545715, max_rel=221.930908203125, norm_rel=0.006240560673177242, ref_abs_avg=13.963833808898926, test_abs_avg=13.963746070861816
liger_forward grad[92] vs paper_forward: mean_abs=0.06568694114685059, max_abs=0.25, mean_rel=0.022786933928728104, max_rel=3.5418179035186768, norm_rel=0.005976851098239422, ref_abs_avg=11.92471981048584, test_abs_avg=11.920695304870605
liger_forward grad[93] vs paper_forward: mean_abs=0.07829145342111588, max_abs=1.0, mean_rel=0.0352591909468174, max_rel=260.4039001464844, norm_rel=0.005998552311211824, ref_abs_avg=14.157875061035156, test_abs_avg=14.158032417297363
liger_forward grad[94] vs paper_forward: mean_abs=0.07388715445995331, max_abs=0.78125, mean_rel=0.03490469604730606, max_rel=179.3942108154297, norm_rel=0.005789421498775482, ref_abs_avg=13.921870231628418, test_abs_avg=13.923564910888672
liger_forward grad[95] vs paper_forward: mean_abs=0.059665679931640625, max_abs=0.25, mean_rel=0.015476579777896404, max_rel=0.820176362991333, norm_rel=0.00592032540589571, ref_abs_avg=11.001059532165527, test_abs_avg=11.002108573913574
liger_forward grad[96] vs paper_forward: mean_abs=0.07147720456123352, max_abs=1.0, mean_rel=0.033346064388751984, max_rel=152.5397491455078, norm_rel=0.005848351866006851, ref_abs_avg=13.39605712890625, test_abs_avg=13.396206855773926
liger_forward grad[97] vs paper_forward: mean_abs=0.06913578510284424, max_abs=1.0, mean_rel=0.03200627118349075, max_rel=149.83334350585938, norm_rel=0.005711707286536694, ref_abs_avg=13.405138969421387, test_abs_avg=13.401933670043945
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  48.519 ms
torch_compile_phases_forward bwd-only: 39.324 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB
liger_forward fwd+bwd:  45.041 ms
liger_forward bwd-only: 32.587 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
production_forward fwd+bwd:  33.798 ms
production_forward bwd-only: 28.876 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.242 GiB, fwd+bwd=5.242 GiB
paper_forward fwd+bwd:  112.801 ms
paper_forward bwd-only: 89.003 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016724977176636457, max_abs=0.04296875
production_forward grad[0] vs paper_forward: mean_abs=0.008516526781022549, max_abs=0.46875, mean_rel=0.07227450609207153, max_rel=108.96517181396484, norm_rel=0.019648784771561623, ref_abs_avg=0.46910181641578674, test_abs_avg=0.46913641691207886
production_forward grad[1] vs paper_forward: mean_abs=5.291468620300293, max_abs=48.0, mean_rel=0.19528277218341827, max_rel=430.2606201171875, norm_rel=0.020427221432328224, ref_abs_avg=230.09219360351562, test_abs_avg=230.10694885253906
production_forward grad[2] vs paper_forward: mean_abs=0.9341864585876465, max_abs=3.0, mean_rel=0.2595764398574829, max_rel=62.98728942871094, norm_rel=0.023756101727485657, ref_abs_avg=38.06415939331055, test_abs_avg=38.08460235595703
production_forward grad[3] vs paper_forward: mean_abs=1.0924286842346191, max_abs=7.0, mean_rel=0.1669110655784607, max_rel=2819.43798828125, norm_rel=0.02272370271384716, ref_abs_avg=48.332984924316406, test_abs_avg=48.34049987792969
production_forward grad[4] vs paper_forward: mean_abs=1.0612750053405762, max_abs=7.5, mean_rel=0.15616124868392944, max_rel=1290.4925537109375, norm_rel=0.022520512342453003, ref_abs_avg=47.41577911376953, test_abs_avg=47.419158935546875
production_forward grad[5] vs paper_forward: mean_abs=0.8064274787902832, max_abs=3.0, mean_rel=0.14658035337924957, max_rel=26.93370246887207, norm_rel=0.023780517280101776, ref_abs_avg=33.662750244140625, test_abs_avg=33.56815719604492
production_forward grad[6] vs paper_forward: mean_abs=0.9522106647491455, max_abs=6.125, mean_rel=0.15206009149551392, max_rel=1136.345703125, norm_rel=0.02253420278429985, ref_abs_avg=42.48179626464844, test_abs_avg=42.48365783691406
production_forward grad[7] vs paper_forward: mean_abs=0.923182487487793, max_abs=6.0, mean_rel=0.15074479579925537, max_rel=781.8162231445312, norm_rel=0.022095108404755592, ref_abs_avg=42.08030700683594, test_abs_avg=42.07835388183594
production_forward grad[8] vs paper_forward: mean_abs=0.7303762435913086, max_abs=2.875, mean_rel=0.11215344816446304, max_rel=19.134979248046875, norm_rel=0.023385053500533104, ref_abs_avg=31.236682891845703, test_abs_avg=31.258953094482422
production_forward grad[9] vs paper_forward: mean_abs=0.8488761782646179, max_abs=5.5, mean_rel=0.15871864557266235, max_rel=964.2625732421875, norm_rel=0.02230996824800968, ref_abs_avg=38.27520751953125, test_abs_avg=38.2806396484375
production_forward grad[10] vs paper_forward: mean_abs=0.8286150693893433, max_abs=5.15625, mean_rel=0.1485186219215393, max_rel=1422.5924072265625, norm_rel=0.021991459652781487, ref_abs_avg=37.91631317138672, test_abs_avg=37.922245025634766
production_forward grad[11] vs paper_forward: mean_abs=0.6560225486755371, max_abs=2.5, mean_rel=0.10709111392498016, max_rel=10.25282096862793, norm_rel=0.022221235558390617, ref_abs_avg=28.711734771728516, test_abs_avg=28.692699432373047
production_forward grad[12] vs paper_forward: mean_abs=0.7959977388381958, max_abs=5.25, mean_rel=0.1452430933713913, max_rel=1630.3858642578125, norm_rel=0.02222512848675251, ref_abs_avg=36.041969299316406, test_abs_avg=36.04365539550781
production_forward grad[13] vs paper_forward: mean_abs=0.7707010507583618, max_abs=4.53125, mean_rel=0.15472638607025146, max_rel=664.0153198242188, norm_rel=0.02187400683760643, ref_abs_avg=35.468666076660156, test_abs_avg=35.46678161621094
production_forward grad[14] vs paper_forward: mean_abs=0.5793113708496094, max_abs=2.28125, mean_rel=0.10891605168581009, max_rel=15.435060501098633, norm_rel=0.020744500681757927, ref_abs_avg=28.146821975708008, test_abs_avg=28.119308471679688
production_forward grad[15] vs paper_forward: mean_abs=0.7417627573013306, max_abs=4.75, mean_rel=0.14396625757217407, max_rel=2009.933349609375, norm_rel=0.02191920019686222, ref_abs_avg=33.98966979980469, test_abs_avg=33.99365234375
production_forward grad[16] vs paper_forward: mean_abs=0.722216010093689, max_abs=5.375, mean_rel=0.13956251740455627, max_rel=1070.6923828125, norm_rel=0.02174513041973114, ref_abs_avg=33.371238708496094, test_abs_avg=33.37174987792969
production_forward grad[17] vs paper_forward: mean_abs=0.5746808648109436, max_abs=2.25, mean_rel=0.13249564170837402, max_rel=28.787494659423828, norm_rel=0.02240438014268875, ref_abs_avg=25.601337432861328, test_abs_avg=25.61768341064453
production_forward grad[18] vs paper_forward: mean_abs=0.6937823295593262, max_abs=4.291015625, mean_rel=0.15744316577911377, max_rel=2259.67919921875, norm_rel=0.02185499481856823, ref_abs_avg=31.904123306274414, test_abs_avg=31.90605354309082
production_forward grad[19] vs paper_forward: mean_abs=0.6762056350708008, max_abs=4.125, mean_rel=0.1432957947254181, max_rel=574.088134765625, norm_rel=0.021659692749381065, ref_abs_avg=31.35386848449707, test_abs_avg=31.355098724365234
production_forward grad[20] vs paper_forward: mean_abs=0.5268616676330566, max_abs=2.0, mean_rel=0.08909370750188828, max_rel=10.875325202941895, norm_rel=0.021415654569864273, ref_abs_avg=24.343849182128906, test_abs_avg=24.339914321899414
production_forward grad[21] vs paper_forward: mean_abs=0.6528925895690918, max_abs=4.0, mean_rel=0.14993712306022644, max_rel=755.6824340820312, norm_rel=0.021812664344906807, ref_abs_avg=30.055133819580078, test_abs_avg=30.058401107788086
production_forward grad[22] vs paper_forward: mean_abs=0.6437166333198547, max_abs=3.75, mean_rel=0.14242114126682281, max_rel=595.0462036132812, norm_rel=0.02157621644437313, ref_abs_avg=29.977703094482422, test_abs_avg=29.979190826416016
production_forward grad[23] vs paper_forward: mean_abs=0.5482578277587891, max_abs=2.0, mean_rel=0.1181977242231369, max_rel=12.858882904052734, norm_rel=0.02324243262410164, ref_abs_avg=23.595991134643555, test_abs_avg=23.610206604003906
production_forward grad[24] vs paper_forward: mean_abs=0.6253417730331421, max_abs=4.328125, mean_rel=0.14304381608963013, max_rel=921.4508056640625, norm_rel=0.021573340520262718, ref_abs_avg=29.100740432739258, test_abs_avg=29.102556228637695
production_forward grad[25] vs paper_forward: mean_abs=0.6126614809036255, max_abs=4.0, mean_rel=0.1473902016878128, max_rel=825.0908813476562, norm_rel=0.021488714963197708, ref_abs_avg=28.59889030456543, test_abs_avg=28.604448318481445
production_forward grad[26] vs paper_forward: mean_abs=0.6242196559906006, max_abs=3.8125, mean_rel=0.20234785974025726, max_rel=35.96190643310547, norm_rel=0.025272542610764503, ref_abs_avg=25.242263793945312, test_abs_avg=25.21878433227539
production_forward grad[27] vs paper_forward: mean_abs=0.7465537786483765, max_abs=5.015625, mean_rel=0.15599732100963593, max_rel=924.8359375, norm_rel=0.02395007759332657, ref_abs_avg=31.283863067626953, test_abs_avg=31.287315368652344
production_forward grad[28] vs paper_forward: mean_abs=0.733301043510437, max_abs=5.25, mean_rel=0.15297162532806396, max_rel=668.4334716796875, norm_rel=0.02359282784163952, ref_abs_avg=31.237028121948242, test_abs_avg=31.241594314575195
production_forward grad[29] vs paper_forward: mean_abs=0.5857944488525391, max_abs=2.5, mean_rel=0.17031842470169067, max_rel=19.124591827392578, norm_rel=0.02477884478867054, ref_abs_avg=23.293100357055664, test_abs_avg=23.314064025878906
production_forward grad[30] vs paper_forward: mean_abs=0.6931589245796204, max_abs=4.5, mean_rel=0.1628878265619278, max_rel=1480.2332763671875, norm_rel=0.024058902636170387, ref_abs_avg=28.919742584228516, test_abs_avg=28.92185401916504
production_forward grad[31] vs paper_forward: mean_abs=0.678051233291626, max_abs=4.5, mean_rel=0.16520199179649353, max_rel=804.7089233398438, norm_rel=0.02409099042415619, ref_abs_avg=28.233787536621094, test_abs_avg=28.237516403198242
production_forward grad[32] vs paper_forward: mean_abs=0.536414623260498, max_abs=2.21875, mean_rel=0.1442440301179886, max_rel=21.39091682434082, norm_rel=0.025296680629253387, ref_abs_avg=20.813899993896484, test_abs_avg=20.76214599609375
production_forward grad[33] vs paper_forward: mean_abs=0.6375893354415894, max_abs=4.5, mean_rel=0.15623733401298523, max_rel=1290.6376953125, norm_rel=0.023872030898928642, ref_abs_avg=26.783992767333984, test_abs_avg=26.785661697387695
production_forward grad[34] vs paper_forward: mean_abs=0.6271668672561646, max_abs=3.921875, mean_rel=0.15072043240070343, max_rel=1015.6807861328125, norm_rel=0.02375747263431549, ref_abs_avg=26.506996154785156, test_abs_avg=26.510265350341797
production_forward grad[35] vs paper_forward: mean_abs=0.4756890535354614, max_abs=1.875, mean_rel=0.1096135675907135, max_rel=10.515790939331055, norm_rel=0.023221975192427635, ref_abs_avg=20.659626007080078, test_abs_avg=20.65161895751953
production_forward grad[36] vs paper_forward: mean_abs=0.5965360403060913, max_abs=3.661773681640625, mean_rel=0.15269318222999573, max_rel=886.2666015625, norm_rel=0.023645823821425438, ref_abs_avg=25.26613998413086, test_abs_avg=25.267868041992188
production_forward grad[37] vs paper_forward: mean_abs=0.5835561752319336, max_abs=3.75, mean_rel=0.16708804666996002, max_rel=864.0731811523438, norm_rel=0.023599039763212204, ref_abs_avg=24.81521224975586, test_abs_avg=24.815784454345703
production_forward grad[38] vs paper_forward: mean_abs=0.4600238800048828, max_abs=1.9375, mean_rel=0.11799272894859314, max_rel=10.892309188842773, norm_rel=0.022547995671629906, ref_abs_avg=20.624536514282227, test_abs_avg=20.66278076171875
production_forward grad[39] vs paper_forward: mean_abs=0.5587010979652405, max_abs=3.671875, mean_rel=0.15969666838645935, max_rel=1252.7421875, norm_rel=0.023529917001724243, ref_abs_avg=23.803756713867188, test_abs_avg=23.805992126464844
production_forward grad[40] vs paper_forward: mean_abs=0.5460359454154968, max_abs=3.5, mean_rel=0.1556294709444046, max_rel=775.464599609375, norm_rel=0.023451771587133408, ref_abs_avg=23.359724044799805, test_abs_avg=23.356887817382812
production_forward grad[41] vs paper_forward: mean_abs=0.4431252181529999, max_abs=1.75, mean_rel=0.09522616118192673, max_rel=4.446803092956543, norm_rel=0.023975465446710587, ref_abs_avg=18.364498138427734, test_abs_avg=18.394487380981445
production_forward grad[42] vs paper_forward: mean_abs=0.5288942456245422, max_abs=3.37890625, mean_rel=0.15292860567569733, max_rel=1714.376953125, norm_rel=0.02312305010855198, ref_abs_avg=22.89861297607422, test_abs_avg=22.89995765686035
production_forward grad[43] vs paper_forward: mean_abs=0.5187113285064697, max_abs=3.0, mean_rel=0.142563134431839, max_rel=420.4057312011719, norm_rel=0.022893495857715607, ref_abs_avg=22.71341896057129, test_abs_avg=22.715848922729492
production_forward grad[44] vs paper_forward: mean_abs=0.4369049072265625, max_abs=1.55859375, mean_rel=0.09485629945993423, max_rel=6.277495384216309, norm_rel=0.023627622053027153, ref_abs_avg=18.647220611572266, test_abs_avg=18.62643051147461
production_forward grad[45] vs paper_forward: mean_abs=0.5070847272872925, max_abs=3.375, mean_rel=0.14482435584068298, max_rel=832.1600952148438, norm_rel=0.022978460416197777, ref_abs_avg=22.127626419067383, test_abs_avg=22.12775993347168
production_forward grad[46] vs paper_forward: mean_abs=0.498970091342926, max_abs=3.5, mean_rel=0.14924466609954834, max_rel=1138.524169921875, norm_rel=0.023034922778606415, ref_abs_avg=21.7298583984375, test_abs_avg=21.73238754272461
production_forward grad[47] vs paper_forward: mean_abs=0.3949818015098572, max_abs=1.5, mean_rel=0.2153138369321823, max_rel=64.88037109375, norm_rel=0.02328205667436123, ref_abs_avg=16.859283447265625, test_abs_avg=16.864730834960938
production_forward grad[48] vs paper_forward: mean_abs=0.4845343232154846, max_abs=3.25, mean_rel=0.14836806058883667, max_rel=931.1958618164062, norm_rel=0.02269715815782547, ref_abs_avg=21.37488555908203, test_abs_avg=21.37584686279297
production_forward grad[49] vs paper_forward: mean_abs=0.47470623254776, max_abs=3.0625, mean_rel=0.14383256435394287, max_rel=618.8759765625, norm_rel=0.022863732650876045, ref_abs_avg=20.83160400390625, test_abs_avg=20.829952239990234
production_forward grad[50] vs paper_forward: mean_abs=0.43869829177856445, max_abs=1.75, mean_rel=0.08566753566265106, max_rel=4.056047439575195, norm_rel=0.02442743256688118, ref_abs_avg=18.251928329467773, test_abs_avg=18.273618698120117
production_forward grad[51] vs paper_forward: mean_abs=0.5347669720649719, max_abs=3.75, mean_rel=0.15597733855247498, max_rel=1067.328125, norm_rel=0.02398543804883957, ref_abs_avg=22.320417404174805, test_abs_avg=22.32059097290039
production_forward grad[52] vs paper_forward: mean_abs=0.5238232612609863, max_abs=3.8125, mean_rel=0.14842316508293152, max_rel=564.5846557617188, norm_rel=0.023997128009796143, ref_abs_avg=21.92493438720703, test_abs_avg=21.924118041992188
production_forward grad[53] vs paper_forward: mean_abs=0.39333969354629517, max_abs=1.59375, mean_rel=0.08349236845970154, max_rel=3.992906093597412, norm_rel=0.023560093715786934, ref_abs_avg=16.9931640625, test_abs_avg=16.965423583984375
production_forward grad[54] vs paper_forward: mean_abs=0.4922170639038086, max_abs=3.25, mean_rel=0.1534109115600586, max_rel=1496.2818603515625, norm_rel=0.023725034669041634, ref_abs_avg=20.79314422607422, test_abs_avg=20.794557571411133
production_forward grad[55] vs paper_forward: mean_abs=0.48685556650161743, max_abs=3.9375, mean_rel=0.14980103075504303, max_rel=619.919921875, norm_rel=0.023755835369229317, ref_abs_avg=20.505279541015625, test_abs_avg=20.5036678314209
production_forward grad[56] vs paper_forward: mean_abs=0.3940126895904541, max_abs=1.5625, mean_rel=0.11369648575782776, max_rel=13.86386775970459, norm_rel=0.02441389113664627, ref_abs_avg=16.1185302734375, test_abs_avg=16.115276336669922
production_forward grad[57] vs paper_forward: mean_abs=0.462469220161438, max_abs=3.25, mean_rel=0.1465282291173935, max_rel=757.8321533203125, norm_rel=0.023389490321278572, ref_abs_avg=19.77801513671875, test_abs_avg=19.776344299316406
production_forward grad[58] vs paper_forward: mean_abs=0.45326825976371765, max_abs=3.25, mean_rel=0.14139792323112488, max_rel=467.3830261230469, norm_rel=0.02293046936392784, ref_abs_avg=19.75804901123047, test_abs_avg=19.765457153320312
production_forward grad[59] vs paper_forward: mean_abs=0.35539746284484863, max_abs=1.3125, mean_rel=0.1046825721859932, max_rel=6.5120849609375, norm_rel=0.022304601967334747, ref_abs_avg=15.895458221435547, test_abs_avg=15.892899513244629
production_forward grad[60] vs paper_forward: mean_abs=0.43506574630737305, max_abs=3.5, mean_rel=0.14212188124656677, max_rel=769.9266357421875, norm_rel=0.022846858948469162, ref_abs_avg=18.995981216430664, test_abs_avg=18.995777130126953
production_forward grad[61] vs paper_forward: mean_abs=0.41909027099609375, max_abs=2.875, mean_rel=0.15564344823360443, max_rel=643.3626708984375, norm_rel=0.022577257826924324, ref_abs_avg=18.581640243530273, test_abs_avg=18.5875301361084
production_forward grad[62] vs paper_forward: mean_abs=0.32279443740844727, max_abs=1.375, mean_rel=0.06355217099189758, max_rel=2.6580729484558105, norm_rel=0.022400954738259315, ref_abs_avg=14.815322875976562, test_abs_avg=14.829072952270508
production_forward grad[63] vs paper_forward: mean_abs=0.40483424067497253, max_abs=2.625, mean_rel=0.14044269919395447, max_rel=880.9564208984375, norm_rel=0.02250242978334427, ref_abs_avg=17.967273712158203, test_abs_avg=17.96766471862793
production_forward grad[64] vs paper_forward: mean_abs=0.4012759327888489, max_abs=2.65966796875, mean_rel=0.14390090107917786, max_rel=725.607421875, norm_rel=0.02230183221399784, ref_abs_avg=17.982826232910156, test_abs_avg=17.980052947998047
production_forward grad[65] vs paper_forward: mean_abs=0.3119617700576782, max_abs=1.09375, mean_rel=0.19586749374866486, max_rel=25.140792846679688, norm_rel=0.021477097645401955, ref_abs_avg=14.275096893310547, test_abs_avg=14.291521072387695
production_forward grad[66] vs paper_forward: mean_abs=0.3853732645511627, max_abs=3.0, mean_rel=0.14110350608825684, max_rel=651.7332763671875, norm_rel=0.022128939628601074, ref_abs_avg=17.4066162109375, test_abs_avg=17.407466888427734
production_forward grad[67] vs paper_forward: mean_abs=0.3788316547870636, max_abs=2.625, mean_rel=0.13890567421913147, max_rel=482.13525390625, norm_rel=0.02207118086516857, ref_abs_avg=17.19763946533203, test_abs_avg=17.196632385253906
production_forward grad[68] vs paper_forward: mean_abs=0.3149089813232422, max_abs=1.1875, mean_rel=0.14512862265110016, max_rel=27.766515731811523, norm_rel=0.0221308171749115, ref_abs_avg=14.206981658935547, test_abs_avg=14.221124649047852
production_forward grad[69] vs paper_forward: mean_abs=0.3689146637916565, max_abs=2.75, mean_rel=0.14049303531646729, max_rel=674.6536254882812, norm_rel=0.02180349826812744, ref_abs_avg=16.87228775024414, test_abs_avg=16.871768951416016
production_forward grad[70] vs paper_forward: mean_abs=0.3656547963619232, max_abs=2.625, mean_rel=0.13148026168346405, max_rel=429.54876708984375, norm_rel=0.02156721241772175, ref_abs_avg=16.94725799560547, test_abs_avg=16.948108673095703
production_forward grad[71] vs paper_forward: mean_abs=0.2864800691604614, max_abs=1.140625, mean_rel=0.10864283889532089, max_rel=20.767051696777344, norm_rel=0.021946728229522705, ref_abs_avg=13.429065704345703, test_abs_avg=13.400352478027344
production_forward grad[72] vs paper_forward: mean_abs=0.3533850312232971, max_abs=2.65625, mean_rel=0.14427103102207184, max_rel=692.8456420898438, norm_rel=0.021414684131741524, ref_abs_avg=16.501415252685547, test_abs_avg=16.501148223876953
production_forward grad[73] vs paper_forward: mean_abs=0.34657543897628784, max_abs=2.5, mean_rel=0.1381857991218567, max_rel=675.5613403320312, norm_rel=0.020970342680811882, ref_abs_avg=16.479467391967773, test_abs_avg=16.478042602539062
production_forward grad[74] vs paper_forward: mean_abs=0.3149375915527344, max_abs=1.25, mean_rel=0.07281025499105453, max_rel=3.283564329147339, norm_rel=0.02198413759469986, ref_abs_avg=14.361818313598633, test_abs_avg=14.358865737915039
production_forward grad[75] vs paper_forward: mean_abs=0.3909900188446045, max_abs=3.125, mean_rel=0.14075446128845215, max_rel=792.7630004882812, norm_rel=0.02272123284637928, ref_abs_avg=17.194286346435547, test_abs_avg=17.1947078704834
production_forward grad[76] vs paper_forward: mean_abs=0.38528865575790405, max_abs=2.59375, mean_rel=0.14179828763008118, max_rel=378.2105407714844, norm_rel=0.022688234224915504, ref_abs_avg=16.927600860595703, test_abs_avg=16.929264068603516
production_forward grad[77] vs paper_forward: mean_abs=0.2848787307739258, max_abs=1.375, mean_rel=0.07127126306295395, max_rel=6.26027774810791, norm_rel=0.020174665376544, ref_abs_avg=14.188911437988281, test_abs_avg=14.16211223602295
production_forward grad[78] vs paper_forward: mean_abs=0.3613927364349365, max_abs=2.75, mean_rel=0.136174738407135, max_rel=709.9568481445312, norm_rel=0.022098269313573837, ref_abs_avg=16.356372833251953, test_abs_avg=16.357097625732422
production_forward grad[79] vs paper_forward: mean_abs=0.3502652645111084, max_abs=2.625, mean_rel=0.13449618220329285, max_rel=370.0673828125, norm_rel=0.021707141771912575, ref_abs_avg=16.112218856811523, test_abs_avg=16.113677978515625
production_forward grad[80] vs paper_forward: mean_abs=0.27269530296325684, max_abs=1.046875, mean_rel=0.15564203262329102, max_rel=12.649084091186523, norm_rel=0.02149864472448826, ref_abs_avg=12.549245834350586, test_abs_avg=12.553145408630371
production_forward grad[81] vs paper_forward: mean_abs=0.33287909626960754, max_abs=2.75, mean_rel=0.1329670250415802, max_rel=550.2312622070312, norm_rel=0.021535351872444153, ref_abs_avg=15.448394775390625, test_abs_avg=15.44853401184082
production_forward grad[82] vs paper_forward: mean_abs=0.32235968112945557, max_abs=2.390625, mean_rel=0.1309472620487213, max_rel=729.7462158203125, norm_rel=0.021549014374613762, ref_abs_avg=15.011478424072266, test_abs_avg=15.015830993652344
production_forward grad[83] vs paper_forward: mean_abs=0.26091718673706055, max_abs=1.09375, mean_rel=0.06376494467258453, max_rel=3.9319307804107666, norm_rel=0.019779236987233162, ref_abs_avg=13.278539657592773, test_abs_avg=13.28893756866455
production_forward grad[84] vs paper_forward: mean_abs=0.30884480476379395, max_abs=3.078125, mean_rel=0.12862977385520935, max_rel=514.1076049804688, norm_rel=0.020900936797261238, ref_abs_avg=14.796744346618652, test_abs_avg=14.798038482666016
production_forward grad[85] vs paper_forward: mean_abs=0.30069082975387573, max_abs=2.75, mean_rel=0.12858696281909943, max_rel=468.0854187011719, norm_rel=0.020247532054781914, ref_abs_avg=14.892906188964844, test_abs_avg=14.90034294128418
production_forward grad[86] vs paper_forward: mean_abs=0.24694204330444336, max_abs=1.0625, mean_rel=0.0894966572523117, max_rel=8.493563652038574, norm_rel=0.020655356347560883, ref_abs_avg=12.043909072875977, test_abs_avg=12.0381498336792
production_forward grad[87] vs paper_forward: mean_abs=0.29397058486938477, max_abs=3.0, mean_rel=0.12875576317310333, max_rel=611.6529541015625, norm_rel=0.020674875006079674, ref_abs_avg=14.286233901977539, test_abs_avg=14.287761688232422
production_forward grad[88] vs paper_forward: mean_abs=0.2856985330581665, max_abs=2.5625, mean_rel=0.13305695354938507, max_rel=602.9246215820312, norm_rel=0.02024855464696884, ref_abs_avg=14.172165870666504, test_abs_avg=14.176727294921875
production_forward grad[89] vs paper_forward: mean_abs=0.24273812770843506, max_abs=0.875, mean_rel=0.08906170725822449, max_rel=3.455336570739746, norm_rel=0.021975664421916008, ref_abs_avg=10.88465404510498, test_abs_avg=10.885753631591797
production_forward grad[90] vs paper_forward: mean_abs=0.27595973014831543, max_abs=2.75, mean_rel=0.12603841722011566, max_rel=521.59228515625, norm_rel=0.02005486935377121, ref_abs_avg=13.835145950317383, test_abs_avg=13.83603572845459
production_forward grad[91] vs paper_forward: mean_abs=0.2734515070915222, max_abs=2.623046875, mean_rel=0.11615802347660065, max_rel=189.9570770263672, norm_rel=0.02014009654521942, ref_abs_avg=13.71701431274414, test_abs_avg=13.716232299804688
production_forward grad[92] vs paper_forward: mean_abs=0.2223522663116455, max_abs=1.1875, mean_rel=0.1399090439081192, max_rel=12.878469467163086, norm_rel=0.01981845684349537, ref_abs_avg=11.238252639770508, test_abs_avg=11.258015632629395
production_forward grad[93] vs paper_forward: mean_abs=0.2661838233470917, max_abs=3.125, mean_rel=0.12142028659582138, max_rel=360.8114318847656, norm_rel=0.019817762076854706, ref_abs_avg=13.562328338623047, test_abs_avg=13.5623779296875
production_forward grad[94] vs paper_forward: mean_abs=0.2509004473686218, max_abs=2.0, mean_rel=0.11496742814779282, max_rel=466.8924255371094, norm_rel=0.018918056041002274, ref_abs_avg=13.378315925598145, test_abs_avg=13.379262924194336
production_forward grad[95] vs paper_forward: mean_abs=0.20786428451538086, max_abs=0.875, mean_rel=0.06852696090936661, max_rel=4.9947309494018555, norm_rel=0.019559288397431374, ref_abs_avg=10.71237564086914, test_abs_avg=10.712827682495117
production_forward grad[96] vs paper_forward: mean_abs=0.2490912675857544, max_abs=2.5, mean_rel=0.11383180320262909, max_rel=469.41937255859375, norm_rel=0.019421875476837158, ref_abs_avg=13.006570816040039, test_abs_avg=13.008607864379883
production_forward grad[97] vs paper_forward: mean_abs=0.2549576759338379, max_abs=2.5, mean_rel=0.11447089910507202, max_rel=301.3953857421875, norm_rel=0.01996759884059429, ref_abs_avg=13.023574829101562, test_abs_avg=13.029669761657715
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001675519859418273, max_abs=0.04296875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00887923501431942, max_abs=0.375, mean_rel=0.0750141590833664, max_rel=116.98673248291016, norm_rel=0.020390989258885384, ref_abs_avg=0.46910181641578674, test_abs_avg=0.4691231846809387
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.383749961853027, max_abs=40.0, mean_rel=0.275328665971756, max_rel=1460.0731201171875, norm_rel=0.020837798714637756, ref_abs_avg=230.09219360351562, test_abs_avg=230.14674377441406
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.0055832862854004, max_abs=3.75, mean_rel=0.26599445939064026, max_rel=68.5491943359375, norm_rel=0.025719670578837395, ref_abs_avg=38.06415939331055, test_abs_avg=38.07513427734375
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.1355551481246948, max_abs=8.0, mean_rel=0.17293259501457214, max_rel=1440.8082275390625, norm_rel=0.023616347461938858, ref_abs_avg=48.332984924316406, test_abs_avg=48.337032318115234
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.1067676544189453, max_abs=7.5, mean_rel=0.16055895388126373, max_rel=1409.4632568359375, norm_rel=0.023469310253858566, ref_abs_avg=47.41577911376953, test_abs_avg=47.41520690917969
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7858681678771973, max_abs=3.875, mean_rel=0.11104496568441391, max_rel=9.268683433532715, norm_rel=0.02367108128964901, ref_abs_avg=33.662750244140625, test_abs_avg=33.54894256591797
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.987816572189331, max_abs=6.375, mean_rel=0.16419485211372375, max_rel=2386.52197265625, norm_rel=0.023368192836642265, ref_abs_avg=42.48179626464844, test_abs_avg=42.48390579223633
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9539260864257812, max_abs=7.0, mean_rel=0.1565397083759308, max_rel=965.2255859375, norm_rel=0.02281525544822216, ref_abs_avg=42.08030700683594, test_abs_avg=42.07643127441406
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7479573488235474, max_abs=3.8125, mean_rel=0.09647980332374573, max_rel=6.025532245635986, norm_rel=0.024284018203616142, ref_abs_avg=31.236682891845703, test_abs_avg=31.23167610168457
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8788729906082153, max_abs=5.625, mean_rel=0.16399598121643066, max_rel=2172.582763671875, norm_rel=0.02307114750146866, ref_abs_avg=38.27520751953125, test_abs_avg=38.27979278564453
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8581568002700806, max_abs=5.875, mean_rel=0.16095435619354248, max_rel=1007.7601928710938, norm_rel=0.02276165410876274, ref_abs_avg=37.91631317138672, test_abs_avg=37.91863250732422
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6786737442016602, max_abs=2.75, mean_rel=0.13148623704910278, max_rel=18.370080947875977, norm_rel=0.023211177438497543, ref_abs_avg=28.711734771728516, test_abs_avg=28.68575668334961
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8215099573135376, max_abs=4.875, mean_rel=0.14887018501758575, max_rel=1355.1343994140625, norm_rel=0.022918784990906715, ref_abs_avg=36.041969299316406, test_abs_avg=36.04118347167969
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7983806729316711, max_abs=5.5, mean_rel=0.1594703197479248, max_rel=646.1557006835938, norm_rel=0.022632205858826637, ref_abs_avg=35.468666076660156, test_abs_avg=35.46747589111328
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.5858263969421387, max_abs=2.0, mean_rel=0.0979451909661293, max_rel=8.5372314453125, norm_rel=0.021058907732367516, ref_abs_avg=28.146821975708008, test_abs_avg=28.136350631713867
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7673273086547852, max_abs=4.828125, mean_rel=0.14865922927856445, max_rel=2290.740478515625, norm_rel=0.022658174857497215, ref_abs_avg=33.98966979980469, test_abs_avg=33.99366760253906
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7481700778007507, max_abs=4.875, mean_rel=0.14307266473770142, max_rel=1334.81787109375, norm_rel=0.02248566597700119, ref_abs_avg=33.371238708496094, test_abs_avg=33.37339782714844
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.575188159942627, max_abs=2.25, mean_rel=0.16756680607795715, max_rel=39.50986862182617, norm_rel=0.022558096796274185, ref_abs_avg=25.601337432861328, test_abs_avg=25.614784240722656
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.7149796485900879, max_abs=4.125, mean_rel=0.16218777000904083, max_rel=2351.28857421875, norm_rel=0.022517738863825798, ref_abs_avg=31.904123306274414, test_abs_avg=31.906164169311523
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6941673755645752, max_abs=4.1875, mean_rel=0.1481570154428482, max_rel=912.6702880859375, norm_rel=0.022238217294216156, ref_abs_avg=31.35386848449707, test_abs_avg=31.354080200195312
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5495553016662598, max_abs=2.25, mean_rel=0.08329902589321136, max_rel=5.929440975189209, norm_rel=0.02248147688806057, ref_abs_avg=24.343849182128906, test_abs_avg=24.360998153686523
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6720927357673645, max_abs=4.2734375, mean_rel=0.15804961323738098, max_rel=1050.3839111328125, norm_rel=0.02243540622293949, ref_abs_avg=30.055133819580078, test_abs_avg=30.05754280090332
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6600103974342346, max_abs=4.25, mean_rel=0.15026268362998962, max_rel=688.4654541015625, norm_rel=0.022136088460683823, ref_abs_avg=29.977703094482422, test_abs_avg=29.978816986083984
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.5678162574768066, max_abs=2.25, mean_rel=0.1341872662305832, max_rel=12.699968338012695, norm_rel=0.024049267172813416, ref_abs_avg=23.595991134643555, test_abs_avg=23.611526489257812
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.642166256904602, max_abs=4.0625, mean_rel=0.1502499282360077, max_rel=857.593505859375, norm_rel=0.022152017802000046, ref_abs_avg=29.100740432739258, test_abs_avg=29.102243423461914
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6282864809036255, max_abs=3.875, mean_rel=0.1506626009941101, max_rel=789.3666381835938, norm_rel=0.02201896347105503, ref_abs_avg=28.59889030456543, test_abs_avg=28.603290557861328
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.62735915184021, max_abs=3.25, mean_rel=0.17338544130325317, max_rel=26.476879119873047, norm_rel=0.025737790390849113, ref_abs_avg=25.242263793945312, test_abs_avg=25.240127563476562
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7662631273269653, max_abs=5.0, mean_rel=0.16398975253105164, max_rel=1216.6959228515625, norm_rel=0.02458500862121582, ref_abs_avg=31.283863067626953, test_abs_avg=31.284765243530273
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7560240030288696, max_abs=4.9375, mean_rel=0.1653742492198944, max_rel=782.1560668945312, norm_rel=0.024312740191817284, ref_abs_avg=31.237028121948242, test_abs_avg=31.24319076538086
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5658974647521973, max_abs=2.0625, mean_rel=0.16950610280036926, max_rel=23.50136375427246, norm_rel=0.024027220904827118, ref_abs_avg=23.293100357055664, test_abs_avg=23.3089656829834
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.7103814482688904, max_abs=4.625, mean_rel=0.16414490342140198, max_rel=882.7607421875, norm_rel=0.02462315745651722, ref_abs_avg=28.919742584228516, test_abs_avg=28.921077728271484
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6963821649551392, max_abs=4.0, mean_rel=0.1664133071899414, max_rel=822.0823364257812, norm_rel=0.024732060730457306, ref_abs_avg=28.233787536621094, test_abs_avg=28.235382080078125
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5314183235168457, max_abs=1.75, mean_rel=0.17649134993553162, max_rel=33.32464599609375, norm_rel=0.024810530245304108, ref_abs_avg=20.813899993896484, test_abs_avg=20.75323486328125
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.651978611946106, max_abs=5.0, mean_rel=0.15867449343204498, max_rel=947.1288452148438, norm_rel=0.02440359815955162, ref_abs_avg=26.783992767333984, test_abs_avg=26.785884857177734
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6412167549133301, max_abs=4.25, mean_rel=0.15489515662193298, max_rel=1299.612548828125, norm_rel=0.024268871173262596, ref_abs_avg=26.506996154785156, test_abs_avg=26.507827758789062
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.4900979995727539, max_abs=2.5, mean_rel=0.10644866526126862, max_rel=10.70547866821289, norm_rel=0.02352146990597248, ref_abs_avg=20.659626007080078, test_abs_avg=20.64505386352539
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.6087283492088318, max_abs=5.0, mean_rel=0.15418383479118347, max_rel=912.4893188476562, norm_rel=0.024138066917657852, ref_abs_avg=25.26613998413086, test_abs_avg=25.26717758178711
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5959938168525696, max_abs=3.75, mean_rel=0.16738995909690857, max_rel=1007.3064575195312, norm_rel=0.02409084513783455, ref_abs_avg=24.81521224975586, test_abs_avg=24.813243865966797
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.46432310342788696, max_abs=1.75, mean_rel=0.09287337213754654, max_rel=8.10239028930664, norm_rel=0.0225981492549181, ref_abs_avg=20.624536514282227, test_abs_avg=20.647964477539062
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5699124336242676, max_abs=3.8125, mean_rel=0.16120386123657227, max_rel=860.4098510742188, norm_rel=0.023986032232642174, ref_abs_avg=23.803756713867188, test_abs_avg=23.80510711669922
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5578910112380981, max_abs=4.25, mean_rel=0.15696242451667786, max_rel=682.0565795898438, norm_rel=0.02394561842083931, ref_abs_avg=23.359724044799805, test_abs_avg=23.356704711914062
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4489176273345947, max_abs=1.625, mean_rel=0.08938135206699371, max_rel=5.260951042175293, norm_rel=0.023941202089190483, ref_abs_avg=18.364498138427734, test_abs_avg=18.380027770996094
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.538776159286499, max_abs=3.25, mean_rel=0.15612535178661346, max_rel=1552.931884765625, norm_rel=0.023552533239126205, ref_abs_avg=22.89861297607422, test_abs_avg=22.89954948425293
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5294864177703857, max_abs=3.265625, mean_rel=0.14516249299049377, max_rel=473.5740051269531, norm_rel=0.023353546857833862, ref_abs_avg=22.71341896057129, test_abs_avg=22.7153263092041
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.43019580841064453, max_abs=1.6875, mean_rel=0.08646206557750702, max_rel=3.6208078861236572, norm_rel=0.023098310455679893, ref_abs_avg=18.647220611572266, test_abs_avg=18.618602752685547
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5158998966217041, max_abs=3.8125, mean_rel=0.1484772264957428, max_rel=914.8116455078125, norm_rel=0.02335207164287567, ref_abs_avg=22.127626419067383, test_abs_avg=22.127561569213867
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.5053948163986206, max_abs=3.25, mean_rel=0.1573420912027359, max_rel=1130.8829345703125, norm_rel=0.023329032585024834, ref_abs_avg=21.7298583984375, test_abs_avg=21.730518341064453
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.3934037685394287, max_abs=1.5625, mean_rel=0.21073423326015472, max_rel=65.62806701660156, norm_rel=0.02339012175798416, ref_abs_avg=16.859283447265625, test_abs_avg=16.861251831054688
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.49201148748397827, max_abs=3.375, mean_rel=0.1501414179801941, max_rel=982.99853515625, norm_rel=0.023044422268867493, ref_abs_avg=21.37488555908203, test_abs_avg=21.37559700012207
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.48464107513427734, max_abs=3.15625, mean_rel=0.14643943309783936, max_rel=453.3179931640625, norm_rel=0.023338651284575462, ref_abs_avg=20.83160400390625, test_abs_avg=20.83104705810547
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4530327320098877, max_abs=1.875, mean_rel=0.08648164570331573, max_rel=6.277321815490723, norm_rel=0.024902502074837685, ref_abs_avg=18.251928329467773, test_abs_avg=18.263423919677734
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5449596047401428, max_abs=3.5, mean_rel=0.15785524249076843, max_rel=1066.3115234375, norm_rel=0.02442898228764534, ref_abs_avg=22.320417404174805, test_abs_avg=22.320831298828125
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5343673229217529, max_abs=3.75, mean_rel=0.15666940808296204, max_rel=651.7825317382812, norm_rel=0.024466145783662796, ref_abs_avg=21.92493438720703, test_abs_avg=21.925308227539062
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.3853227496147156, max_abs=1.625, mean_rel=0.07525008916854858, max_rel=2.1851487159729004, norm_rel=0.023469582200050354, ref_abs_avg=16.9931640625, test_abs_avg=16.972564697265625
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.5010436177253723, max_abs=3.5, mean_rel=0.15524564683437347, max_rel=1580.585205078125, norm_rel=0.024136224761605263, ref_abs_avg=20.79314422607422, test_abs_avg=20.794925689697266
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.49499812722206116, max_abs=3.875, mean_rel=0.14872215688228607, max_rel=565.3312377929688, norm_rel=0.0241704024374485, ref_abs_avg=20.505279541015625, test_abs_avg=20.504539489746094
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.3891592025756836, max_abs=1.75, mean_rel=0.1314374804496765, max_rel=16.62256622314453, norm_rel=0.02431158721446991, ref_abs_avg=16.1185302734375, test_abs_avg=16.125200271606445
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.46974268555641174, max_abs=3.375, mean_rel=0.1487870216369629, max_rel=1092.920166015625, norm_rel=0.02375418320298195, ref_abs_avg=19.77801513671875, test_abs_avg=19.776384353637695
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.46061262488365173, max_abs=3.0, mean_rel=0.14484058320522308, max_rel=414.07891845703125, norm_rel=0.023310838267207146, ref_abs_avg=19.75804901123047, test_abs_avg=19.765522003173828
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.35596275329589844, max_abs=1.25, mean_rel=0.11629609763622284, max_rel=9.649374961853027, norm_rel=0.022136041894555092, ref_abs_avg=15.895458221435547, test_abs_avg=15.892671585083008
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.4413453936576843, max_abs=3.75, mean_rel=0.14377841353416443, max_rel=820.1694946289062, norm_rel=0.02316325716674328, ref_abs_avg=18.995981216430664, test_abs_avg=18.995637893676758
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4246463477611542, max_abs=2.75, mean_rel=0.15366995334625244, max_rel=688.4057006835938, norm_rel=0.022871889173984528, ref_abs_avg=18.581640243530273, test_abs_avg=18.58827018737793
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.33077287673950195, max_abs=1.3125, mean_rel=0.07056356966495514, max_rel=4.039093017578125, norm_rel=0.02320765145123005, ref_abs_avg=14.815322875976562, test_abs_avg=14.81570816040039
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.4104214906692505, max_abs=2.625, mean_rel=0.1413087397813797, max_rel=675.7905883789062, norm_rel=0.022793004289269447, ref_abs_avg=17.967273712158203, test_abs_avg=17.967418670654297
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.4070447087287903, max_abs=2.625, mean_rel=0.141652911901474, max_rel=581.721923828125, norm_rel=0.02261531911790371, ref_abs_avg=17.982826232910156, test_abs_avg=17.981239318847656
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.31162822246551514, max_abs=1.125, mean_rel=0.1777930110692978, max_rel=29.944303512573242, norm_rel=0.02181771770119667, ref_abs_avg=14.275096893310547, test_abs_avg=14.290156364440918
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.38959330320358276, max_abs=3.0, mean_rel=0.14457586407661438, max_rel=669.6644287109375, norm_rel=0.02237088792026043, ref_abs_avg=17.4066162109375, test_abs_avg=17.407146453857422
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3825662136077881, max_abs=2.625, mean_rel=0.138630211353302, max_rel=325.95574951171875, norm_rel=0.022271856665611267, ref_abs_avg=17.19763946533203, test_abs_avg=17.19936752319336
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.30339479446411133, max_abs=1.25, mean_rel=0.17561981081962585, max_rel=33.481021881103516, norm_rel=0.021522613242268562, ref_abs_avg=14.206981658935547, test_abs_avg=14.2177152633667
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.37279611825942993, max_abs=2.75, mean_rel=0.14255021512508392, max_rel=775.6480712890625, norm_rel=0.022039415314793587, ref_abs_avg=16.87228775024414, test_abs_avg=16.87175750732422
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3716782033443451, max_abs=2.75, mean_rel=0.13716229796409607, max_rel=455.9158935546875, norm_rel=0.021902432665228844, ref_abs_avg=16.94725799560547, test_abs_avg=16.948719024658203
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.29161715507507324, max_abs=1.1875, mean_rel=0.09372054785490036, max_rel=8.474492073059082, norm_rel=0.02238256298005581, ref_abs_avg=13.429065704345703, test_abs_avg=13.39590072631836
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3566781282424927, max_abs=2.75, mean_rel=0.1467965543270111, max_rel=663.21728515625, norm_rel=0.021609624847769737, ref_abs_avg=16.501415252685547, test_abs_avg=16.501861572265625
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.34975534677505493, max_abs=2.375, mean_rel=0.14025148749351501, max_rel=582.7362060546875, norm_rel=0.02115541696548462, ref_abs_avg=16.479467391967773, test_abs_avg=16.479598999023438
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.31940484046936035, max_abs=1.5, mean_rel=0.07808889448642731, max_rel=5.012202262878418, norm_rel=0.022045142948627472, ref_abs_avg=14.361818313598633, test_abs_avg=14.350866317749023
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.39650818705558777, max_abs=3.375, mean_rel=0.13987646996974945, max_rel=648.021484375, norm_rel=0.023028556257486343, ref_abs_avg=17.194286346435547, test_abs_avg=17.194183349609375
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3867926597595215, max_abs=2.796875, mean_rel=0.14745700359344482, max_rel=708.6973876953125, norm_rel=0.02279917523264885, ref_abs_avg=16.927600860595703, test_abs_avg=16.927284240722656
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.293886661529541, max_abs=1.375, mean_rel=0.07570164650678635, max_rel=7.344146728515625, norm_rel=0.02075050212442875, ref_abs_avg=14.188911437988281, test_abs_avg=14.149362564086914
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3656854033470154, max_abs=2.8125, mean_rel=0.13753387331962585, max_rel=798.2930908203125, norm_rel=0.022350413724780083, ref_abs_avg=16.356372833251953, test_abs_avg=16.35674285888672
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.3555454611778259, max_abs=3.03125, mean_rel=0.1376681625843048, max_rel=406.9659118652344, norm_rel=0.022067049518227577, ref_abs_avg=16.112218856811523, test_abs_avg=16.113069534301758
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.27118873596191406, max_abs=1.16650390625, mean_rel=0.1877676546573639, max_rel=19.169328689575195, norm_rel=0.021398942917585373, ref_abs_avg=12.549245834350586, test_abs_avg=12.553060531616211
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.3360247015953064, max_abs=2.875, mean_rel=0.13439038395881653, max_rel=512.3062133789062, norm_rel=0.02172168903052807, ref_abs_avg=15.448394775390625, test_abs_avg=15.448643684387207
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.326682448387146, max_abs=2.390625, mean_rel=0.13169477880001068, max_rel=623.062744140625, norm_rel=0.021838592365384102, ref_abs_avg=15.011478424072266, test_abs_avg=15.01445198059082
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.271028995513916, max_abs=1.03125, mean_rel=0.07282423973083496, max_rel=8.508647918701172, norm_rel=0.020804481580853462, ref_abs_avg=13.278539657592773, test_abs_avg=13.299118041992188
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.31176623702049255, max_abs=3.078125, mean_rel=0.1298130601644516, max_rel=505.2637634277344, norm_rel=0.02107751928269863, ref_abs_avg=14.796744346618652, test_abs_avg=14.797834396362305
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.308199942111969, max_abs=2.75, mean_rel=0.132501482963562, max_rel=553.0238647460938, norm_rel=0.020724724978208542, ref_abs_avg=14.892906188964844, test_abs_avg=14.902706146240234
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.23703742027282715, max_abs=1.0, mean_rel=0.11100251227617264, max_rel=15.052142143249512, norm_rel=0.020425109192728996, ref_abs_avg=12.043909072875977, test_abs_avg=12.042261123657227
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.2957287132740021, max_abs=3.0, mean_rel=0.1282726228237152, max_rel=601.1207275390625, norm_rel=0.020794445648789406, ref_abs_avg=14.286233901977539, test_abs_avg=14.28785514831543
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.2914887070655823, max_abs=2.5625, mean_rel=0.1322088986635208, max_rel=568.1669311523438, norm_rel=0.020605474710464478, ref_abs_avg=14.172165870666504, test_abs_avg=14.173637390136719
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.24057716131210327, max_abs=0.875, mean_rel=0.08429276943206787, max_rel=3.6153059005737305, norm_rel=0.021789977326989174, ref_abs_avg=10.88465404510498, test_abs_avg=10.879766464233398
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.27741116285324097, max_abs=2.75, mean_rel=0.12695816159248352, max_rel=560.0050048828125, norm_rel=0.020154466852545738, ref_abs_avg=13.835145950317383, test_abs_avg=13.836033821105957
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.27518314123153687, max_abs=2.716796875, mean_rel=0.11692903190851212, max_rel=221.07659912109375, norm_rel=0.020291371271014214, ref_abs_avg=13.71701431274414, test_abs_avg=13.717589378356934
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.23077630996704102, max_abs=1.0625, mean_rel=0.11677536368370056, max_rel=8.300655364990234, norm_rel=0.020308665931224823, ref_abs_avg=11.238252639770508, test_abs_avg=11.255188941955566
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2669348418712616, max_abs=3.5, mean_rel=0.1216578260064125, max_rel=572.1837768554688, norm_rel=0.019875086843967438, ref_abs_avg=13.562328338623047, test_abs_avg=13.563321113586426
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.25178834795951843, max_abs=2.25, mean_rel=0.1176033616065979, max_rel=454.6437072753906, norm_rel=0.01895476132631302, ref_abs_avg=13.378315925598145, test_abs_avg=13.38149642944336
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.20840579271316528, max_abs=0.875, mean_rel=0.06268897652626038, max_rel=2.6208245754241943, norm_rel=0.019693518057465553, ref_abs_avg=10.71237564086914, test_abs_avg=10.70067024230957
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.24947313964366913, max_abs=2.5625, mean_rel=0.1143169105052948, max_rel=525.7714233398438, norm_rel=0.01946566440165043, ref_abs_avg=13.006570816040039, test_abs_avg=13.00925064086914
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.25247305631637573, max_abs=2.59375, mean_rel=0.11363424360752106, max_rel=321.03009033203125, norm_rel=0.019713707268238068, ref_abs_avg=13.023574829101562, test_abs_avg=13.029396057128906
liger_forward vs paper_forward output: mean_abs=0.0001530988229205832, max_abs=0.03125
liger_forward grad[0] vs paper_forward: mean_abs=0.0036065997555851936, max_abs=0.3203125, mean_rel=0.025630660355091095, max_rel=58.41661071777344, norm_rel=0.00965156126767397, ref_abs_avg=0.46910181641578674, test_abs_avg=0.4690794348716736
liger_forward grad[1] vs paper_forward: mean_abs=1.6107192039489746, max_abs=16.0, mean_rel=0.05294565111398697, max_rel=84.530029296875, norm_rel=0.006622500717639923, ref_abs_avg=230.09219360351562, test_abs_avg=230.07994079589844
liger_forward grad[2] vs paper_forward: mean_abs=0.3341546058654785, max_abs=1.5, mean_rel=0.07513079047203064, max_rel=8.427932739257812, norm_rel=0.009097971022129059, ref_abs_avg=38.06415939331055, test_abs_avg=38.05392074584961
liger_forward grad[3] vs paper_forward: mean_abs=0.41196921467781067, max_abs=3.0, mean_rel=0.0644756630063057, max_rel=645.00341796875, norm_rel=0.008836828172206879, ref_abs_avg=48.332984924316406, test_abs_avg=48.33182907104492
liger_forward grad[4] vs paper_forward: mean_abs=0.3947829008102417, max_abs=3.0, mean_rel=0.05579278618097305, max_rel=380.8564453125, norm_rel=0.008659969083964825, ref_abs_avg=47.41577911376953, test_abs_avg=47.40819549560547
liger_forward grad[5] vs paper_forward: mean_abs=0.2785515785217285, max_abs=1.0, mean_rel=0.03035372495651245, max_rel=2.7624311447143555, norm_rel=0.008805952966213226, ref_abs_avg=33.662750244140625, test_abs_avg=33.65776824951172
liger_forward grad[6] vs paper_forward: mean_abs=0.3520159423351288, max_abs=2.5, mean_rel=0.057777777314186096, max_rel=630.3062133789062, norm_rel=0.008615227416157722, ref_abs_avg=42.48179626464844, test_abs_avg=42.482147216796875
liger_forward grad[7] vs paper_forward: mean_abs=0.33740484714508057, max_abs=2.0, mean_rel=0.05243393033742905, max_rel=216.9375, norm_rel=0.00837419368326664, ref_abs_avg=42.08030700683594, test_abs_avg=42.07891082763672
liger_forward grad[8] vs paper_forward: mean_abs=0.24829506874084473, max_abs=1.125, mean_rel=0.031091801822185516, max_rel=2.1771345138549805, norm_rel=0.008237467147409916, ref_abs_avg=31.236682891845703, test_abs_avg=31.259170532226562
liger_forward grad[9] vs paper_forward: mean_abs=0.31012704968452454, max_abs=2.0, mean_rel=0.05723555386066437, max_rel=742.4971923828125, norm_rel=0.008433183655142784, ref_abs_avg=38.27520751953125, test_abs_avg=38.274986267089844
liger_forward grad[10] vs paper_forward: mean_abs=0.29925206303596497, max_abs=2.0, mean_rel=0.05841785669326782, max_rel=1121.546142578125, norm_rel=0.008248605765402317, ref_abs_avg=37.91631317138672, test_abs_avg=37.91688537597656
liger_forward grad[11] vs paper_forward: mean_abs=0.2365560531616211, max_abs=1.0, mean_rel=0.04770959913730621, max_rel=8.938505172729492, norm_rel=0.008522924967110157, ref_abs_avg=28.711734771728516, test_abs_avg=28.715885162353516
liger_forward grad[12] vs paper_forward: mean_abs=0.2885608673095703, max_abs=2.0, mean_rel=0.05048634856939316, max_rel=394.6601867675781, norm_rel=0.008342133834958076, ref_abs_avg=36.041969299316406, test_abs_avg=36.04267120361328
liger_forward grad[13] vs paper_forward: mean_abs=0.2743147611618042, max_abs=2.0, mean_rel=0.05857476592063904, max_rel=609.2064208984375, norm_rel=0.008088676258921623, ref_abs_avg=35.468666076660156, test_abs_avg=35.469329833984375
liger_forward grad[14] vs paper_forward: mean_abs=0.2199486643075943, max_abs=1.0, mean_rel=0.036441173404455185, max_rel=5.731499195098877, norm_rel=0.008244466967880726, ref_abs_avg=28.146821975708008, test_abs_avg=28.157554626464844
liger_forward grad[15] vs paper_forward: mean_abs=0.2653641104698181, max_abs=2.0, mean_rel=0.0499190092086792, max_rel=498.7479248046875, norm_rel=0.008139023557305336, ref_abs_avg=33.98966979980469, test_abs_avg=33.989402770996094
liger_forward grad[16] vs paper_forward: mean_abs=0.25486576557159424, max_abs=2.0, mean_rel=0.05003351718187332, max_rel=274.7469177246094, norm_rel=0.007980399765074253, ref_abs_avg=33.371238708496094, test_abs_avg=33.37080001831055
liger_forward grad[17] vs paper_forward: mean_abs=0.2106034755706787, max_abs=1.0, mean_rel=0.04466114938259125, max_rel=7.620049476623535, norm_rel=0.008457774296402931, ref_abs_avg=25.601337432861328, test_abs_avg=25.617048263549805
liger_forward grad[18] vs paper_forward: mean_abs=0.2439718246459961, max_abs=1.5, mean_rel=0.05882644280791283, max_rel=621.5690307617188, norm_rel=0.007992570288479328, ref_abs_avg=31.904123306274414, test_abs_avg=31.904071807861328
liger_forward grad[19] vs paper_forward: mean_abs=0.2348955124616623, max_abs=1.5, mean_rel=0.048880115151405334, max_rel=636.2639770507812, norm_rel=0.007862044498324394, ref_abs_avg=31.35386848449707, test_abs_avg=31.352249145507812
liger_forward grad[20] vs paper_forward: mean_abs=0.18769216537475586, max_abs=0.75, mean_rel=0.0261903814971447, max_rel=1.6017920970916748, norm_rel=0.007922311313450336, ref_abs_avg=24.343849182128906, test_abs_avg=24.34702491760254
liger_forward grad[21] vs paper_forward: mean_abs=0.2267685979604721, max_abs=1.875, mean_rel=0.05341193079948425, max_rel=315.56817626953125, norm_rel=0.007898102514445782, ref_abs_avg=30.055133819580078, test_abs_avg=30.055034637451172
liger_forward grad[22] vs paper_forward: mean_abs=0.22085264325141907, max_abs=1.5, mean_rel=0.051126934587955475, max_rel=672.5529174804688, norm_rel=0.00771866412833333, ref_abs_avg=29.977703094482422, test_abs_avg=29.97745132446289
liger_forward grad[23] vs paper_forward: mean_abs=0.1840524673461914, max_abs=0.75, mean_rel=0.035141997039318085, max_rel=2.01550555229187, norm_rel=0.00814877636730671, ref_abs_avg=23.595991134643555, test_abs_avg=23.610658645629883
liger_forward grad[24] vs paper_forward: mean_abs=0.21585336327552795, max_abs=1.5, mean_rel=0.04807295650243759, max_rel=273.0258483886719, norm_rel=0.007761329412460327, ref_abs_avg=29.100740432739258, test_abs_avg=29.100509643554688
liger_forward grad[25] vs paper_forward: mean_abs=0.20790769159793854, max_abs=1.5, mean_rel=0.049611903727054596, max_rel=256.1921691894531, norm_rel=0.007623882498592138, ref_abs_avg=28.59889030456543, test_abs_avg=28.598880767822266
liger_forward grad[26] vs paper_forward: mean_abs=0.1992628574371338, max_abs=0.75, mean_rel=0.047785162925720215, max_rel=4.980926990509033, norm_rel=0.008084569126367569, ref_abs_avg=25.242263793945312, test_abs_avg=25.24292755126953
liger_forward grad[27] vs paper_forward: mean_abs=0.24355663359165192, max_abs=1.5625, mean_rel=0.05069843679666519, max_rel=428.78826904296875, norm_rel=0.008127260021865368, ref_abs_avg=31.283863067626953, test_abs_avg=31.2845401763916
liger_forward grad[28] vs paper_forward: mean_abs=0.2351868450641632, max_abs=1.5, mean_rel=0.047292560338974, max_rel=196.46395874023438, norm_rel=0.0078881261870265, ref_abs_avg=31.237028121948242, test_abs_avg=31.236431121826172
liger_forward grad[29] vs paper_forward: mean_abs=0.16939258575439453, max_abs=0.70703125, mean_rel=0.04434484988451004, max_rel=3.009559154510498, norm_rel=0.007580806501209736, ref_abs_avg=23.293100357055664, test_abs_avg=23.288341522216797
liger_forward grad[30] vs paper_forward: mean_abs=0.2171459197998047, max_abs=1.5, mean_rel=0.048367343842983246, max_rel=294.0746154785156, norm_rel=0.007857924327254295, ref_abs_avg=28.919742584228516, test_abs_avg=28.919326782226562
liger_forward grad[31] vs paper_forward: mean_abs=0.2072814553976059, max_abs=1.5, mean_rel=0.04676176607608795, max_rel=217.07826232910156, norm_rel=0.007697077933698893, ref_abs_avg=28.233787536621094, test_abs_avg=28.2338924407959
liger_forward grad[32] vs paper_forward: mean_abs=0.1569221019744873, max_abs=0.75, mean_rel=0.03563179448246956, max_rel=2.4982621669769287, norm_rel=0.008012622594833374, ref_abs_avg=20.813899993896484, test_abs_avg=20.81072998046875
liger_forward grad[33] vs paper_forward: mean_abs=0.19669786095619202, max_abs=1.25, mean_rel=0.04802744835615158, max_rel=209.5336456298828, norm_rel=0.00769464485347271, ref_abs_avg=26.783992767333984, test_abs_avg=26.784135818481445
liger_forward grad[34] vs paper_forward: mean_abs=0.18983814120292664, max_abs=1.5, mean_rel=0.04257948696613312, max_rel=134.9759063720703, norm_rel=0.007530966307967901, ref_abs_avg=26.506996154785156, test_abs_avg=26.506542205810547
liger_forward grad[35] vs paper_forward: mean_abs=0.15733051300048828, max_abs=0.65625, mean_rel=0.028807468712329865, max_rel=1.8375959396362305, norm_rel=0.007820221595466137, ref_abs_avg=20.659626007080078, test_abs_avg=20.67308807373047
liger_forward grad[36] vs paper_forward: mean_abs=0.1814706027507782, max_abs=1.25, mean_rel=0.047332823276519775, max_rel=347.1198425292969, norm_rel=0.007528723683208227, ref_abs_avg=25.26613998413086, test_abs_avg=25.26644515991211
liger_forward grad[37] vs paper_forward: mean_abs=0.17455914616584778, max_abs=1.0234375, mean_rel=0.047817669808864594, max_rel=392.1694641113281, norm_rel=0.007398086134344339, ref_abs_avg=24.81521224975586, test_abs_avg=24.81464385986328
liger_forward grad[38] vs paper_forward: mean_abs=0.13724565505981445, max_abs=0.625, mean_rel=0.03415833041071892, max_rel=4.783664226531982, norm_rel=0.007125145755708218, ref_abs_avg=20.624536514282227, test_abs_avg=20.63165283203125
liger_forward grad[39] vs paper_forward: mean_abs=0.16864123940467834, max_abs=1.125, mean_rel=0.04742128774523735, max_rel=239.95941162109375, norm_rel=0.0074452287517488, ref_abs_avg=23.803756713867188, test_abs_avg=23.804176330566406
liger_forward grad[40] vs paper_forward: mean_abs=0.16167055070400238, max_abs=1.0625, mean_rel=0.04713897779583931, max_rel=268.2289123535156, norm_rel=0.0072872876189649105, ref_abs_avg=23.359724044799805, test_abs_avg=23.36037826538086
liger_forward grad[41] vs paper_forward: mean_abs=0.13753175735473633, max_abs=0.5625, mean_rel=0.035850998014211655, max_rel=2.803565263748169, norm_rel=0.0078125, ref_abs_avg=18.364498138427734, test_abs_avg=18.390384674072266
liger_forward grad[42] vs paper_forward: mean_abs=0.1589697301387787, max_abs=1.25, mean_rel=0.045408040285110474, max_rel=232.41128540039062, norm_rel=0.007302809040993452, ref_abs_avg=22.89861297607422, test_abs_avg=22.89836883544922
liger_forward grad[43] vs paper_forward: mean_abs=0.15529924631118774, max_abs=1.09375, mean_rel=0.043446969240903854, max_rel=161.28675842285156, norm_rel=0.007206697016954422, ref_abs_avg=22.71341896057129, test_abs_avg=22.712688446044922
liger_forward grad[44] vs paper_forward: mean_abs=0.12244415283203125, max_abs=0.5, mean_rel=0.025000274181365967, max_rel=1.2497857809066772, norm_rel=0.007046529091894627, ref_abs_avg=18.647220611572266, test_abs_avg=18.65148162841797
liger_forward grad[45] vs paper_forward: mean_abs=0.15121608972549438, max_abs=1.25, mean_rel=0.04277480021119118, max_rel=276.29327392578125, norm_rel=0.007201840169727802, ref_abs_avg=22.127626419067383, test_abs_avg=22.127256393432617
liger_forward grad[46] vs paper_forward: mean_abs=0.14616996049880981, max_abs=1.25, mean_rel=0.04419088363647461, max_rel=227.58714294433594, norm_rel=0.00711215240880847, ref_abs_avg=21.7298583984375, test_abs_avg=21.730327606201172
liger_forward grad[47] vs paper_forward: mean_abs=0.12296003103256226, max_abs=0.5, mean_rel=0.05093877762556076, max_rel=11.887237548828125, norm_rel=0.007563118357211351, ref_abs_avg=16.859283447265625, test_abs_avg=16.865234375
liger_forward grad[48] vs paper_forward: mean_abs=0.14343935251235962, max_abs=1.0, mean_rel=0.04315251111984253, max_rel=318.882568359375, norm_rel=0.007078555878251791, ref_abs_avg=21.37488555908203, test_abs_avg=21.374488830566406
liger_forward grad[49] vs paper_forward: mean_abs=0.13847537338733673, max_abs=1.0, mean_rel=0.041030965745449066, max_rel=153.38755798339844, norm_rel=0.00702932383865118, ref_abs_avg=20.83160400390625, test_abs_avg=20.831668853759766
liger_forward grad[50] vs paper_forward: mean_abs=0.1408991813659668, max_abs=0.57421875, mean_rel=0.03409228473901749, max_rel=3.2297332286834717, norm_rel=0.0081422608345747, ref_abs_avg=18.251928329467773, test_abs_avg=18.23790740966797
liger_forward grad[51] vs paper_forward: mean_abs=0.16134780645370483, max_abs=1.125, mean_rel=0.046372056007385254, max_rel=129.0754852294922, norm_rel=0.00757893081754446, ref_abs_avg=22.320417404174805, test_abs_avg=22.32024574279785
liger_forward grad[52] vs paper_forward: mean_abs=0.15587332844734192, max_abs=1.0234375, mean_rel=0.044615987688302994, max_rel=198.24879455566406, norm_rel=0.007492957636713982, ref_abs_avg=21.92493438720703, test_abs_avg=21.92452621459961
liger_forward grad[53] vs paper_forward: mean_abs=0.11528158187866211, max_abs=0.5, mean_rel=0.02416759729385376, max_rel=1.5120761394500732, norm_rel=0.007291196845471859, ref_abs_avg=16.9931640625, test_abs_avg=16.978107452392578
liger_forward grad[54] vs paper_forward: mean_abs=0.1462308019399643, max_abs=1.0625, mean_rel=0.04629301652312279, max_rel=293.1642150878906, norm_rel=0.0073870406486094, ref_abs_avg=20.79314422607422, test_abs_avg=20.793155670166016
liger_forward grad[55] vs paper_forward: mean_abs=0.1410680115222931, max_abs=1.0, mean_rel=0.043024200946092606, max_rel=369.74755859375, norm_rel=0.007246049586683512, ref_abs_avg=20.505279541015625, test_abs_avg=20.505706787109375
liger_forward grad[56] vs paper_forward: mean_abs=0.11131006479263306, max_abs=0.5, mean_rel=0.032849375158548355, max_rel=4.118985176086426, norm_rel=0.007386454846709967, ref_abs_avg=16.1185302734375, test_abs_avg=16.10810661315918
liger_forward grad[57] vs paper_forward: mean_abs=0.13624589145183563, max_abs=1.0, mean_rel=0.04550711438059807, max_rel=255.6707763671875, norm_rel=0.007249366957694292, ref_abs_avg=19.77801513671875, test_abs_avg=19.77814292907715
liger_forward grad[58] vs paper_forward: mean_abs=0.13128092885017395, max_abs=1.0, mean_rel=0.039989203214645386, max_rel=111.248291015625, norm_rel=0.007000322453677654, ref_abs_avg=19.75804901123047, test_abs_avg=19.757572174072266
liger_forward grad[59] vs paper_forward: mean_abs=0.11318683624267578, max_abs=0.40625, mean_rel=0.030271358788013458, max_rel=2.367673397064209, norm_rel=0.007399567402899265, ref_abs_avg=15.895458221435547, test_abs_avg=15.899764060974121
liger_forward grad[60] vs paper_forward: mean_abs=0.12678135931491852, max_abs=1.0, mean_rel=0.0415877103805542, max_rel=148.6251678466797, norm_rel=0.007020805962383747, ref_abs_avg=18.995981216430664, test_abs_avg=18.995662689208984
liger_forward grad[61] vs paper_forward: mean_abs=0.12187868356704712, max_abs=1.0, mean_rel=0.04492318257689476, max_rel=154.64036560058594, norm_rel=0.0069345589727163315, ref_abs_avg=18.581640243530273, test_abs_avg=18.581851959228516
liger_forward grad[62] vs paper_forward: mean_abs=0.10799312591552734, max_abs=0.5, mean_rel=0.027440108358860016, max_rel=1.8804957866668701, norm_rel=0.0075641535222530365, ref_abs_avg=14.815322875976562, test_abs_avg=14.82030200958252
liger_forward grad[63] vs paper_forward: mean_abs=0.11803003400564194, max_abs=1.0, mean_rel=0.041776932775974274, max_rel=223.62579345703125, norm_rel=0.0069296048022806644, ref_abs_avg=17.967273712158203, test_abs_avg=17.967098236083984
liger_forward grad[64] vs paper_forward: mean_abs=0.11470484733581543, max_abs=1.0, mean_rel=0.03981565684080124, max_rel=187.42027282714844, norm_rel=0.006756540387868881, ref_abs_avg=17.982826232910156, test_abs_avg=17.983135223388672
liger_forward grad[65] vs paper_forward: mean_abs=0.09046375751495361, max_abs=0.4375, mean_rel=0.04728885367512703, max_rel=6.504058361053467, norm_rel=0.0067347194999456406, ref_abs_avg=14.275096893310547, test_abs_avg=14.273120880126953
liger_forward grad[66] vs paper_forward: mean_abs=0.11174831539392471, max_abs=1.0, mean_rel=0.0407063290476799, max_rel=147.1247100830078, norm_rel=0.006781087722629309, ref_abs_avg=17.4066162109375, test_abs_avg=17.406307220458984
liger_forward grad[67] vs paper_forward: mean_abs=0.10753738880157471, max_abs=0.75, mean_rel=0.04026825353503227, max_rel=114.4000015258789, norm_rel=0.006641867104917765, ref_abs_avg=17.19763946533203, test_abs_avg=17.197553634643555
liger_forward grad[68] vs paper_forward: mean_abs=0.08702301979064941, max_abs=0.375, mean_rel=0.02304811403155327, max_rel=2.0705432891845703, norm_rel=0.006449833046644926, ref_abs_avg=14.206981658935547, test_abs_avg=14.200532913208008
liger_forward grad[69] vs paper_forward: mean_abs=0.10577739775180817, max_abs=1.0, mean_rel=0.04065746068954468, max_rel=165.8526153564453, norm_rel=0.006631511729210615, ref_abs_avg=16.87228775024414, test_abs_avg=16.871967315673828
liger_forward grad[70] vs paper_forward: mean_abs=0.10133329033851624, max_abs=0.75, mean_rel=0.03784111142158508, max_rel=167.01808166503906, norm_rel=0.006371432915329933, ref_abs_avg=16.94725799560547, test_abs_avg=16.946136474609375
liger_forward grad[71] vs paper_forward: mean_abs=0.08595776557922363, max_abs=0.375, mean_rel=0.03757338225841522, max_rel=6.3670525550842285, norm_rel=0.006867363583296537, ref_abs_avg=13.429065704345703, test_abs_avg=13.430139541625977
liger_forward grad[72] vs paper_forward: mean_abs=0.10204952955245972, max_abs=1.0, mean_rel=0.04089558124542236, max_rel=200.60787963867188, norm_rel=0.006569467484951019, ref_abs_avg=16.501415252685547, test_abs_avg=16.50107192993164
liger_forward grad[73] vs paper_forward: mean_abs=0.0986945703625679, max_abs=1.0, mean_rel=0.039069972932338715, max_rel=180.83399963378906, norm_rel=0.0063747381791472435, ref_abs_avg=16.479467391967773, test_abs_avg=16.47832679748535
liger_forward grad[74] vs paper_forward: mean_abs=0.1008598804473877, max_abs=0.46484375, mean_rel=0.031560368835926056, max_rel=2.3384199142456055, norm_rel=0.007356458343565464, ref_abs_avg=14.361818313598633, test_abs_avg=14.357203483581543
liger_forward grad[75] vs paper_forward: mean_abs=0.11572135239839554, max_abs=1.0, mean_rel=0.042235665023326874, max_rel=137.2855682373047, norm_rel=0.0070799728855490685, ref_abs_avg=17.194286346435547, test_abs_avg=17.19393539428711
liger_forward grad[76] vs paper_forward: mean_abs=0.11161915957927704, max_abs=1.0, mean_rel=0.045609649270772934, max_rel=254.91539001464844, norm_rel=0.006957423873245716, ref_abs_avg=16.927600860595703, test_abs_avg=16.927845001220703
liger_forward grad[77] vs paper_forward: mean_abs=0.08960556983947754, max_abs=0.5, mean_rel=0.019939307123422623, max_rel=0.802635669708252, norm_rel=0.006747325882315636, ref_abs_avg=14.188911437988281, test_abs_avg=14.184144973754883
liger_forward grad[78] vs paper_forward: mean_abs=0.10561154037714005, max_abs=0.75, mean_rel=0.040241584181785583, max_rel=205.68450927734375, norm_rel=0.006824948824942112, ref_abs_avg=16.356372833251953, test_abs_avg=16.356204986572266
liger_forward grad[79] vs paper_forward: mean_abs=0.10126807540655136, max_abs=1.0, mean_rel=0.037690285593271255, max_rel=109.03559112548828, norm_rel=0.006653800141066313, ref_abs_avg=16.112218856811523, test_abs_avg=16.111492156982422
liger_forward grad[80] vs paper_forward: mean_abs=0.08122038841247559, max_abs=0.375, mean_rel=0.05369634926319122, max_rel=9.984025001525879, norm_rel=0.006827466655522585, ref_abs_avg=12.549245834350586, test_abs_avg=12.54619026184082
liger_forward grad[81] vs paper_forward: mean_abs=0.097645103931427, max_abs=1.0, mean_rel=0.04129388555884361, max_rel=201.78453063964844, norm_rel=0.006688497960567474, ref_abs_avg=15.448394775390625, test_abs_avg=15.448115348815918
liger_forward grad[82] vs paper_forward: mean_abs=0.09380613267421722, max_abs=0.8125, mean_rel=0.039331890642642975, max_rel=220.11192321777344, norm_rel=0.006662178784608841, ref_abs_avg=15.011478424072266, test_abs_avg=15.010072708129883
liger_forward grad[83] vs paper_forward: mean_abs=0.07547616958618164, max_abs=0.3125, mean_rel=0.028576642274856567, max_rel=5.322645664215088, norm_rel=0.006335521582514048, ref_abs_avg=13.278539657592773, test_abs_avg=13.271480560302734
liger_forward grad[84] vs paper_forward: mean_abs=0.09103459119796753, max_abs=1.0, mean_rel=0.03831053525209427, max_rel=144.59483337402344, norm_rel=0.0065351566299796104, ref_abs_avg=14.796744346618652, test_abs_avg=14.796584129333496
liger_forward grad[85] vs paper_forward: mean_abs=0.08772297203540802, max_abs=0.75, mean_rel=0.03637309372425079, max_rel=112.76814270019531, norm_rel=0.006293147802352905, ref_abs_avg=14.892906188964844, test_abs_avg=14.891858100891113
liger_forward grad[86] vs paper_forward: mean_abs=0.07351827621459961, max_abs=0.3125, mean_rel=0.019226573407649994, max_rel=1.3646732568740845, norm_rel=0.006659140810370445, ref_abs_avg=12.043909072875977, test_abs_avg=12.046542167663574
liger_forward grad[87] vs paper_forward: mean_abs=0.0857117623090744, max_abs=1.0, mean_rel=0.03783699870109558, max_rel=187.95323181152344, norm_rel=0.006408986635506153, ref_abs_avg=14.286233901977539, test_abs_avg=14.2864408493042
liger_forward grad[88] vs paper_forward: mean_abs=0.0827370434999466, max_abs=1.0, mean_rel=0.038925305008888245, max_rel=344.9041442871094, norm_rel=0.006263426039367914, ref_abs_avg=14.172165870666504, test_abs_avg=14.173360824584961
liger_forward grad[89] vs paper_forward: mean_abs=0.0678742527961731, max_abs=0.25, mean_rel=0.025348000228405, max_rel=1.4340776205062866, norm_rel=0.006465554237365723, ref_abs_avg=10.88465404510498, test_abs_avg=10.882837295532227
liger_forward grad[90] vs paper_forward: mean_abs=0.07968226075172424, max_abs=1.0, mean_rel=0.03563632071018219, max_rel=180.6793670654297, norm_rel=0.0061968122608959675, ref_abs_avg=13.835145950317383, test_abs_avg=13.835190773010254
liger_forward grad[91] vs paper_forward: mean_abs=0.0775345116853714, max_abs=0.75, mean_rel=0.033422477543354034, max_rel=89.85023498535156, norm_rel=0.006139530334621668, ref_abs_avg=13.71701431274414, test_abs_avg=13.718732833862305
liger_forward grad[92] vs paper_forward: mean_abs=0.06511783599853516, max_abs=0.265625, mean_rel=0.03933883085846901, max_rel=5.4660139083862305, norm_rel=0.006330527830868959, ref_abs_avg=11.238252639770508, test_abs_avg=11.243078231811523
liger_forward grad[93] vs paper_forward: mean_abs=0.07632049918174744, max_abs=1.0, mean_rel=0.035428352653980255, max_rel=135.62774658203125, norm_rel=0.006086855195462704, ref_abs_avg=13.562328338623047, test_abs_avg=13.562192916870117
liger_forward grad[94] vs paper_forward: mean_abs=0.07483525574207306, max_abs=0.75, mean_rel=0.034774068742990494, max_rel=125.97010803222656, norm_rel=0.006095344200730324, ref_abs_avg=13.378315925598145, test_abs_avg=13.377277374267578
liger_forward grad[95] vs paper_forward: mean_abs=0.06508493423461914, max_abs=0.25, mean_rel=0.038333453238010406, max_rel=7.761043071746826, norm_rel=0.006413795053958893, ref_abs_avg=10.71237564086914, test_abs_avg=10.713566780090332
liger_forward grad[96] vs paper_forward: mean_abs=0.07100486755371094, max_abs=0.75, mean_rel=0.03374098613858223, max_rel=165.2021942138672, norm_rel=0.005960205104202032, ref_abs_avg=13.006570816040039, test_abs_avg=13.0064115524292
liger_forward grad[97] vs paper_forward: mean_abs=0.07050357758998871, max_abs=0.75, mean_rel=0.031211011111736298, max_rel=60.493526458740234, norm_rel=0.005946963559836149, ref_abs_avg=13.023574829101562, test_abs_avg=13.025157928466797
identity layers + randn queries
paper_forward fwd+bwd:  112.811 ms
paper_forward bwd-only: 88.947 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
liger_forward fwd+bwd:  45.740 ms
liger_forward bwd-only: 33.385 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
torch_compile_phases_forward fwd+bwd:  48.550 ms
torch_compile_phases_forward bwd-only: 39.294 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB
production_forward fwd+bwd:  33.807 ms
production_forward bwd-only: 28.863 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.242 GiB, fwd+bwd=5.242 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0015769866295158863, max_abs=0.037109375
production_forward grad[0] vs paper_forward: mean_abs=0.007859058678150177, max_abs=0.390625, mean_rel=0.06988957524299622, max_rel=157.60096740722656, norm_rel=0.019200744107365608, ref_abs_avg=0.4447036385536194, test_abs_avg=0.4447169899940491
production_forward grad[1] vs paper_forward: mean_abs=4.839491367340088, max_abs=32.0, mean_rel=0.16768722236156464, max_rel=236.33311462402344, norm_rel=0.019228406250476837, ref_abs_avg=216.8657684326172, test_abs_avg=216.8358612060547
production_forward grad[2] vs paper_forward: mean_abs=0.8197860717773438, max_abs=3.25, mean_rel=0.08125274628400803, max_rel=4.45094633102417, norm_rel=0.023563912138342857, ref_abs_avg=34.70317840576172, test_abs_avg=34.71815490722656
production_forward grad[3] vs paper_forward: mean_abs=0.9835494756698608, max_abs=6.375, mean_rel=0.16269871592521667, max_rel=1085.9971923828125, norm_rel=0.022543519735336304, ref_abs_avg=43.88035583496094, test_abs_avg=43.88097381591797
production_forward grad[4] vs paper_forward: mean_abs=0.9637361764907837, max_abs=6.0390625, mean_rel=0.15006619691848755, max_rel=938.9013061523438, norm_rel=0.02244349755346775, ref_abs_avg=43.18003463745117, test_abs_avg=43.18260192871094
production_forward grad[5] vs paper_forward: mean_abs=0.7409038543701172, max_abs=3.125, mean_rel=0.08312097936868668, max_rel=6.297507286071777, norm_rel=0.023094110190868378, ref_abs_avg=31.88287353515625, test_abs_avg=31.91669273376465
production_forward grad[6] vs paper_forward: mean_abs=0.8760613203048706, max_abs=5.5, mean_rel=0.158050999045372, max_rel=1290.12841796875, norm_rel=0.022428520023822784, ref_abs_avg=39.260986328125, test_abs_avg=39.264488220214844
production_forward grad[7] vs paper_forward: mean_abs=0.8524292707443237, max_abs=5.0, mean_rel=0.15656554698944092, max_rel=657.12353515625, norm_rel=0.022195758298039436, ref_abs_avg=38.664772033691406, test_abs_avg=38.66284942626953
production_forward grad[8] vs paper_forward: mean_abs=0.64847731590271, max_abs=2.3125, mean_rel=0.23798523843288422, max_rel=62.97290802001953, norm_rel=0.02037200704216957, ref_abs_avg=30.479751586914062, test_abs_avg=30.468963623046875
production_forward grad[9] vs paper_forward: mean_abs=0.7940187454223633, max_abs=4.75, mean_rel=0.154868945479393, max_rel=1471.197998046875, norm_rel=0.022105908021330833, ref_abs_avg=36.08742904663086, test_abs_avg=36.088314056396484
production_forward grad[10] vs paper_forward: mean_abs=0.7737361192703247, max_abs=4.75, mean_rel=0.13864418864250183, max_rel=891.1212768554688, norm_rel=0.021799320355057716, ref_abs_avg=35.6722412109375, test_abs_avg=35.674774169921875
production_forward grad[11] vs paper_forward: mean_abs=0.5931310653686523, max_abs=2.75, mean_rel=0.09127020090818405, max_rel=5.370563983917236, norm_rel=0.02236134558916092, ref_abs_avg=26.688491821289062, test_abs_avg=26.651649475097656
production_forward grad[12] vs paper_forward: mean_abs=0.7265149354934692, max_abs=4.5, mean_rel=0.15101972222328186, max_rel=1259.0869140625, norm_rel=0.022086849436163902, ref_abs_avg=33.05670166015625, test_abs_avg=33.06031036376953
production_forward grad[13] vs paper_forward: mean_abs=0.7078348398208618, max_abs=4.75, mean_rel=0.1442502737045288, max_rel=608.4443359375, norm_rel=0.021450785920023918, ref_abs_avg=33.180259704589844, test_abs_avg=33.18388748168945
production_forward grad[14] vs paper_forward: mean_abs=0.5860629081726074, max_abs=1.875, mean_rel=0.06958933919668198, max_rel=4.89663553237915, norm_rel=0.021586598828434944, ref_abs_avg=26.606555938720703, test_abs_avg=26.56353759765625
production_forward grad[15] vs paper_forward: mean_abs=0.6838663816452026, max_abs=4.25, mean_rel=0.1477343589067459, max_rel=1543.07763671875, norm_rel=0.021667994558811188, ref_abs_avg=31.686622619628906, test_abs_avg=31.690366744995117
production_forward grad[16] vs paper_forward: mean_abs=0.6698057055473328, max_abs=4.3125, mean_rel=0.14557412266731262, max_rel=1092.5931396484375, norm_rel=0.021627822890877724, ref_abs_avg=31.149213790893555, test_abs_avg=31.15158462524414
production_forward grad[17] vs paper_forward: mean_abs=0.5123386383056641, max_abs=2.125, mean_rel=0.09840300679206848, max_rel=5.836175441741943, norm_rel=0.021300751715898514, ref_abs_avg=23.859525680541992, test_abs_avg=23.864591598510742
production_forward grad[18] vs paper_forward: mean_abs=0.6443193554878235, max_abs=4.0, mean_rel=0.13996320962905884, max_rel=1076.69482421875, norm_rel=0.021621109917759895, ref_abs_avg=29.939586639404297, test_abs_avg=29.943382263183594
production_forward grad[19] vs paper_forward: mean_abs=0.6330267190933228, max_abs=3.8125, mean_rel=0.14851172268390656, max_rel=961.73779296875, norm_rel=0.02134772017598152, ref_abs_avg=29.788314819335938, test_abs_avg=29.790983200073242
production_forward grad[20] vs paper_forward: mean_abs=0.4838399887084961, max_abs=2.125, mean_rel=0.0953221246600151, max_rel=7.027032852172852, norm_rel=0.021699000149965286, ref_abs_avg=23.034311294555664, test_abs_avg=23.066753387451172
production_forward grad[21] vs paper_forward: mean_abs=0.6155641078948975, max_abs=3.75, mean_rel=0.151887446641922, max_rel=714.1624145507812, norm_rel=0.021494388580322266, ref_abs_avg=28.780601501464844, test_abs_avg=28.78079605102539
production_forward grad[22] vs paper_forward: mean_abs=0.596436619758606, max_abs=3.5, mean_rel=0.14632312953472137, max_rel=1021.6041259765625, norm_rel=0.021074311807751656, ref_abs_avg=28.469703674316406, test_abs_avg=28.46815299987793
production_forward grad[23] vs paper_forward: mean_abs=0.4973796606063843, max_abs=2.0625, mean_rel=0.10191376507282257, max_rel=9.934736251831055, norm_rel=0.023557934910058975, ref_abs_avg=21.19402313232422, test_abs_avg=21.182392120361328
production_forward grad[24] vs paper_forward: mean_abs=0.5803911685943604, max_abs=3.9375, mean_rel=0.14329743385314941, max_rel=1290.9610595703125, norm_rel=0.021455276757478714, ref_abs_avg=27.148971557617188, test_abs_avg=27.151695251464844
production_forward grad[25] vs paper_forward: mean_abs=0.5699120759963989, max_abs=3.5, mean_rel=0.14695042371749878, max_rel=925.5557250976562, norm_rel=0.021195681765675545, ref_abs_avg=26.999244689941406, test_abs_avg=26.9990234375
production_forward grad[26] vs paper_forward: mean_abs=0.5343375205993652, max_abs=2.25, mean_rel=0.12377727031707764, max_rel=15.837427139282227, norm_rel=0.02339409664273262, ref_abs_avg=22.786113739013672, test_abs_avg=22.814035415649414
production_forward grad[27] vs paper_forward: mean_abs=0.6804297566413879, max_abs=5.5, mean_rel=0.16277331113815308, max_rel=1395.92529296875, norm_rel=0.02343422919511795, ref_abs_avg=29.164714813232422, test_abs_avg=29.167585372924805
production_forward grad[28] vs paper_forward: mean_abs=0.6725671887397766, max_abs=4.109375, mean_rel=0.16369764506816864, max_rel=792.4381103515625, norm_rel=0.023194458335638046, ref_abs_avg=29.08936309814453, test_abs_avg=29.089977264404297
production_forward grad[29] vs paper_forward: mean_abs=0.5646986961364746, max_abs=2.25, mean_rel=0.18653862178325653, max_rel=21.303138732910156, norm_rel=0.026714198291301727, ref_abs_avg=21.488231658935547, test_abs_avg=21.463735580444336
production_forward grad[30] vs paper_forward: mean_abs=0.6330764293670654, max_abs=4.5, mean_rel=0.1547115445137024, max_rel=775.492431640625, norm_rel=0.02378353476524353, ref_abs_avg=26.718082427978516, test_abs_avg=26.719444274902344
production_forward grad[31] vs paper_forward: mean_abs=0.6215436458587646, max_abs=3.8125, mean_rel=0.15489321947097778, max_rel=821.8517456054688, norm_rel=0.023677930235862732, ref_abs_avg=26.350622177124023, test_abs_avg=26.348289489746094
production_forward grad[32] vs paper_forward: mean_abs=0.4855961799621582, max_abs=2.0, mean_rel=0.09930002689361572, max_rel=13.218524932861328, norm_rel=0.024397408589720726, ref_abs_avg=20.175966262817383, test_abs_avg=20.179094314575195
production_forward grad[33] vs paper_forward: mean_abs=0.5851355791091919, max_abs=4.0, mean_rel=0.16134673357009888, max_rel=927.330078125, norm_rel=0.02355232648551464, ref_abs_avg=24.924352645874023, test_abs_avg=24.924680709838867
production_forward grad[34] vs paper_forward: mean_abs=0.5788965225219727, max_abs=3.875, mean_rel=0.15712282061576843, max_rel=906.0914306640625, norm_rel=0.023321401327848434, ref_abs_avg=24.91260528564453, test_abs_avg=24.911893844604492
production_forward grad[35] vs paper_forward: mean_abs=0.46816253662109375, max_abs=1.875, mean_rel=0.0708891898393631, max_rel=2.367290496826172, norm_rel=0.024753715842962265, ref_abs_avg=18.913673400878906, test_abs_avg=18.901012420654297
production_forward grad[36] vs paper_forward: mean_abs=0.5489468574523926, max_abs=3.625, mean_rel=0.15322333574295044, max_rel=1045.68994140625, norm_rel=0.023347651585936546, ref_abs_avg=23.573896408081055, test_abs_avg=23.575786590576172
production_forward grad[37] vs paper_forward: mean_abs=0.53980553150177, max_abs=3.125, mean_rel=0.16083377599716187, max_rel=1019.56396484375, norm_rel=0.02324764057993889, ref_abs_avg=23.306900024414062, test_abs_avg=23.310707092285156
production_forward grad[38] vs paper_forward: mean_abs=0.44505321979522705, max_abs=1.875, mean_rel=0.15032416582107544, max_rel=22.761701583862305, norm_rel=0.025117486715316772, ref_abs_avg=17.40027618408203, test_abs_avg=17.403215408325195
production_forward grad[39] vs paper_forward: mean_abs=0.5164220333099365, max_abs=3.75, mean_rel=0.14462599158287048, max_rel=540.709716796875, norm_rel=0.023218365386128426, ref_abs_avg=22.31418800354004, test_abs_avg=22.314144134521484
production_forward grad[40] vs paper_forward: mean_abs=0.5052907466888428, max_abs=3.25, mean_rel=0.14039425551891327, max_rel=588.3331909179688, norm_rel=0.02281443402171135, ref_abs_avg=22.23807144165039, test_abs_avg=22.238365173339844
production_forward grad[41] vs paper_forward: mean_abs=0.3990895748138428, max_abs=1.5625, mean_rel=0.0834868997335434, max_rel=7.902340412139893, norm_rel=0.022396603599190712, ref_abs_avg=17.811513900756836, test_abs_avg=17.87077522277832
production_forward grad[42] vs paper_forward: mean_abs=0.49212515354156494, max_abs=3.375, mean_rel=0.13926716148853302, max_rel=660.4306030273438, norm_rel=0.022961990907788277, ref_abs_avg=21.48455238342285, test_abs_avg=21.484004974365234
production_forward grad[43] vs paper_forward: mean_abs=0.48150795698165894, max_abs=3.25, mean_rel=0.14850035309791565, max_rel=922.8806762695312, norm_rel=0.02278975211083889, ref_abs_avg=21.206384658813477, test_abs_avg=21.20358657836914
production_forward grad[44] vs paper_forward: mean_abs=0.39348649978637695, max_abs=1.365234375, mean_rel=0.10670562088489532, max_rel=10.90085220336914, norm_rel=0.02346017025411129, ref_abs_avg=16.676340103149414, test_abs_avg=16.660385131835938
production_forward grad[45] vs paper_forward: mean_abs=0.46539413928985596, max_abs=3.3125, mean_rel=0.1462528109550476, max_rel=613.2609252929688, norm_rel=0.02249368280172348, ref_abs_avg=20.70207977294922, test_abs_avg=20.701953887939453
production_forward grad[46] vs paper_forward: mean_abs=0.4585874080657959, max_abs=2.96875, mean_rel=0.15314096212387085, max_rel=1641.87158203125, norm_rel=0.022548604756593704, ref_abs_avg=20.394086837768555, test_abs_avg=20.397544860839844
production_forward grad[47] vs paper_forward: mean_abs=0.3649924397468567, max_abs=1.5, mean_rel=0.16048617660999298, max_rel=36.97883987426758, norm_rel=0.02272125892341137, ref_abs_avg=15.986122131347656, test_abs_avg=15.978780746459961
production_forward grad[48] vs paper_forward: mean_abs=0.44397103786468506, max_abs=3.0, mean_rel=0.14437754452228546, max_rel=593.8509521484375, norm_rel=0.022502772510051727, ref_abs_avg=19.749088287353516, test_abs_avg=19.74753189086914
production_forward grad[49] vs paper_forward: mean_abs=0.43828433752059937, max_abs=2.75, mean_rel=0.1476704180240631, max_rel=921.7105712890625, norm_rel=0.022292807698249817, ref_abs_avg=19.713165283203125, test_abs_avg=19.71048927307129
production_forward grad[50] vs paper_forward: mean_abs=0.43786191940307617, max_abs=1.5, mean_rel=0.08498506247997284, max_rel=3.2593929767608643, norm_rel=0.025858229026198387, ref_abs_avg=17.19921112060547, test_abs_avg=17.20977783203125
production_forward grad[51] vs paper_forward: mean_abs=0.516448974609375, max_abs=3.25, mean_rel=0.15308085083961487, max_rel=899.89892578125, norm_rel=0.02384716458618641, ref_abs_avg=21.70978355407715, test_abs_avg=21.711872100830078
production_forward grad[52] vs paper_forward: mean_abs=0.510408878326416, max_abs=3.25, mean_rel=0.16081185638904572, max_rel=956.9097290039062, norm_rel=0.023972513154149055, ref_abs_avg=21.367347717285156, test_abs_avg=21.371795654296875
production_forward grad[53] vs paper_forward: mean_abs=0.3725738525390625, max_abs=1.375, mean_rel=0.12301845848560333, max_rel=10.018988609313965, norm_rel=0.021455999463796616, ref_abs_avg=17.45859146118164, test_abs_avg=17.45504379272461
production_forward grad[54] vs paper_forward: mean_abs=0.4733816385269165, max_abs=3.25, mean_rel=0.15806269645690918, max_rel=839.6109008789062, norm_rel=0.02325335517525673, ref_abs_avg=20.36489486694336, test_abs_avg=20.36705780029297
production_forward grad[55] vs paper_forward: mean_abs=0.46213752031326294, max_abs=3.0, mean_rel=0.14825889468193054, max_rel=682.8087158203125, norm_rel=0.023305019363760948, ref_abs_avg=19.92876434326172, test_abs_avg=19.933788299560547
production_forward grad[56] vs paper_forward: mean_abs=0.34579408168792725, max_abs=1.265625, mean_rel=0.1590963900089264, max_rel=24.00411605834961, norm_rel=0.021844137459993362, ref_abs_avg=15.82847785949707, test_abs_avg=15.849828720092773
production_forward grad[57] vs paper_forward: mean_abs=0.4364544451236725, max_abs=2.875, mean_rel=0.15165822207927704, max_rel=803.4962768554688, norm_rel=0.022863294929265976, ref_abs_avg=19.125770568847656, test_abs_avg=19.126190185546875
production_forward grad[58] vs paper_forward: mean_abs=0.4287150800228119, max_abs=3.25, mean_rel=0.14350281655788422, max_rel=443.2512512207031, norm_rel=0.022657841444015503, ref_abs_avg=18.968307495117188, test_abs_avg=18.970985412597656
production_forward grad[59] vs paper_forward: mean_abs=0.3056541085243225, max_abs=1.419921875, mean_rel=0.10567157715559006, max_rel=10.590336799621582, norm_rel=0.020482493564486504, ref_abs_avg=15.395845413208008, test_abs_avg=15.387626647949219
production_forward grad[60] vs paper_forward: mean_abs=0.40896159410476685, max_abs=2.75, mean_rel=0.1421736776828766, max_rel=633.8319702148438, norm_rel=0.02259747125208378, ref_abs_avg=18.09425163269043, test_abs_avg=18.09478187561035
production_forward grad[61] vs paper_forward: mean_abs=0.39913108944892883, max_abs=2.5, mean_rel=0.14099928736686707, max_rel=581.6771240234375, norm_rel=0.022280285134911537, ref_abs_avg=17.94189453125, test_abs_avg=17.94342041015625
production_forward grad[62] vs paper_forward: mean_abs=0.3062021732330322, max_abs=1.125, mean_rel=0.07511316239833832, max_rel=1.8892923593521118, norm_rel=0.02214827574789524, ref_abs_avg=13.747462272644043, test_abs_avg=13.743022918701172
production_forward grad[63] vs paper_forward: mean_abs=0.38255900144577026, max_abs=2.5, mean_rel=0.14253631234169006, max_rel=940.1900024414062, norm_rel=0.022153545171022415, ref_abs_avg=17.265810012817383, test_abs_avg=17.26468276977539
production_forward grad[64] vs paper_forward: mean_abs=0.3745778799057007, max_abs=2.625, mean_rel=0.13652648031711578, max_rel=448.9198913574219, norm_rel=0.022097136825323105, ref_abs_avg=16.966121673583984, test_abs_avg=16.966434478759766
production_forward grad[65] vs paper_forward: mean_abs=0.2804908752441406, max_abs=1.28125, mean_rel=0.055921122431755066, max_rel=1.345961093902588, norm_rel=0.02006562426686287, ref_abs_avg=14.296320915222168, test_abs_avg=14.277886390686035
production_forward grad[66] vs paper_forward: mean_abs=0.35605061054229736, max_abs=2.375, mean_rel=0.13703298568725586, max_rel=805.31103515625, norm_rel=0.021559976041316986, ref_abs_avg=16.532445907592773, test_abs_avg=16.532611846923828
production_forward grad[67] vs paper_forward: mean_abs=0.3546755313873291, max_abs=2.28125, mean_rel=0.13393418490886688, max_rel=577.952880859375, norm_rel=0.021490145474672318, ref_abs_avg=16.526321411132812, test_abs_avg=16.52701187133789
production_forward grad[68] vs paper_forward: mean_abs=0.2822103500366211, max_abs=1.0703125, mean_rel=0.12881113588809967, max_rel=18.042600631713867, norm_rel=0.02101794444024563, ref_abs_avg=13.685275077819824, test_abs_avg=13.666419982910156
production_forward grad[69] vs paper_forward: mean_abs=0.34369125962257385, max_abs=2.5, mean_rel=0.14195924997329712, max_rel=884.3699951171875, norm_rel=0.021242808550596237, ref_abs_avg=16.187175750732422, test_abs_avg=16.18679428100586
production_forward grad[70] vs paper_forward: mean_abs=0.3391845226287842, max_abs=3.125, mean_rel=0.13286346197128296, max_rel=588.418701171875, norm_rel=0.021035965532064438, ref_abs_avg=16.140270233154297, test_abs_avg=16.1458740234375
production_forward grad[71] vs paper_forward: mean_abs=0.2609395980834961, max_abs=1.125, mean_rel=0.08419162034988403, max_rel=5.8986592292785645, norm_rel=0.01986456848680973, ref_abs_avg=13.407135009765625, test_abs_avg=13.408913612365723
production_forward grad[72] vs paper_forward: mean_abs=0.3284454643726349, max_abs=2.5, mean_rel=0.13326343894004822, max_rel=552.32177734375, norm_rel=0.02093636430799961, ref_abs_avg=15.67212200164795, test_abs_avg=15.670666694641113
production_forward grad[73] vs paper_forward: mean_abs=0.32115474343299866, max_abs=2.328125, mean_rel=0.12045574188232422, max_rel=386.4267578125, norm_rel=0.02099158801138401, ref_abs_avg=15.316255569458008, test_abs_avg=15.314289093017578
production_forward grad[74] vs paper_forward: mean_abs=0.27844664454460144, max_abs=1.375, mean_rel=0.128500834107399, max_rel=12.301277160644531, norm_rel=0.023773137480020523, ref_abs_avg=11.973441123962402, test_abs_avg=11.94864273071289
production_forward grad[75] vs paper_forward: mean_abs=0.35099977254867554, max_abs=2.5, mean_rel=0.14005187153816223, max_rel=518.9605102539062, norm_rel=0.02315845526754856, ref_abs_avg=15.17931842803955, test_abs_avg=15.178544998168945
production_forward grad[76] vs paper_forward: mean_abs=0.3447735905647278, max_abs=2.5, mean_rel=0.14698120951652527, max_rel=1196.38623046875, norm_rel=0.02272459864616394, ref_abs_avg=15.19411849975586, test_abs_avg=15.191387176513672
production_forward grad[77] vs paper_forward: mean_abs=0.258750319480896, max_abs=1.0, mean_rel=0.0694626122713089, max_rel=2.8499109745025635, norm_rel=0.02236275002360344, ref_abs_avg=11.523183822631836, test_abs_avg=11.530344009399414
production_forward grad[78] vs paper_forward: mean_abs=0.3155621290206909, max_abs=2.25, mean_rel=0.14041367173194885, max_rel=1453.74267578125, norm_rel=0.02220921218395233, ref_abs_avg=14.199612617492676, test_abs_avg=14.199843406677246
production_forward grad[79] vs paper_forward: mean_abs=0.3151639401912689, max_abs=2.75, mean_rel=0.13744068145751953, max_rel=661.2846069335938, norm_rel=0.022255323827266693, ref_abs_avg=14.157151222229004, test_abs_avg=14.152641296386719
production_forward grad[80] vs paper_forward: mean_abs=0.24745464324951172, max_abs=0.953125, mean_rel=0.04755198210477829, max_rel=1.533861517906189, norm_rel=0.021123621612787247, ref_abs_avg=12.09807014465332, test_abs_avg=12.079998970031738
production_forward grad[81] vs paper_forward: mean_abs=0.29500794410705566, max_abs=2.5, mean_rel=0.13453266024589539, max_rel=714.0628051757812, norm_rel=0.02144431136548519, ref_abs_avg=13.736690521240234, test_abs_avg=13.736963272094727
production_forward grad[82] vs paper_forward: mean_abs=0.287261039018631, max_abs=2.4296875, mean_rel=0.13722197711467743, max_rel=1387.5372314453125, norm_rel=0.02125559188425541, ref_abs_avg=13.546258926391602, test_abs_avg=13.543163299560547
production_forward grad[83] vs paper_forward: mean_abs=0.23838160932064056, max_abs=0.875, mean_rel=0.1889830082654953, max_rel=50.7734375, norm_rel=0.020904576405882835, ref_abs_avg=11.382596969604492, test_abs_avg=11.378894805908203
production_forward grad[84] vs paper_forward: mean_abs=0.27926892042160034, max_abs=2.75, mean_rel=0.13446971774101257, max_rel=641.8294677734375, norm_rel=0.02088569849729538, ref_abs_avg=13.396707534790039, test_abs_avg=13.396920204162598
production_forward grad[85] vs paper_forward: mean_abs=0.2703045904636383, max_abs=2.5, mean_rel=0.12651079893112183, max_rel=422.5817565917969, norm_rel=0.020484942942857742, ref_abs_avg=13.223886489868164, test_abs_avg=13.225702285766602
production_forward grad[86] vs paper_forward: mean_abs=0.2130732536315918, max_abs=0.9609375, mean_rel=0.07836660742759705, max_rel=5.5112624168396, norm_rel=0.019468529149889946, ref_abs_avg=10.970518112182617, test_abs_avg=10.976926803588867
production_forward grad[87] vs paper_forward: mean_abs=0.2596728801727295, max_abs=2.375, mean_rel=0.12093118578195572, max_rel=755.9156494140625, norm_rel=0.02023620530962944, ref_abs_avg=12.884490013122559, test_abs_avg=12.884435653686523
production_forward grad[88] vs paper_forward: mean_abs=0.2557350993156433, max_abs=2.1875, mean_rel=0.11888962984085083, max_rel=394.4505920410156, norm_rel=0.019945397973060608, ref_abs_avg=12.873274803161621, test_abs_avg=12.871262550354004
production_forward grad[89] vs paper_forward: mean_abs=0.19954204559326172, max_abs=0.830078125, mean_rel=0.16138672828674316, max_rel=36.917999267578125, norm_rel=0.019424226135015488, ref_abs_avg=10.284322738647461, test_abs_avg=10.30273723602295
production_forward grad[90] vs paper_forward: mean_abs=0.2426137924194336, max_abs=2.5, mean_rel=0.11787399649620056, max_rel=357.1700439453125, norm_rel=0.019997308030724525, ref_abs_avg=12.222588539123535, test_abs_avg=12.221227645874023
production_forward grad[91] vs paper_forward: mean_abs=0.24080157279968262, max_abs=2.25, mean_rel=0.12453386187553406, max_rel=550.1370239257812, norm_rel=0.019948182627558708, ref_abs_avg=12.168198585510254, test_abs_avg=12.170001983642578
production_forward grad[92] vs paper_forward: mean_abs=0.19568467140197754, max_abs=0.75, mean_rel=0.05696273595094681, max_rel=2.6552212238311768, norm_rel=0.018811779096722603, ref_abs_avg=10.403371810913086, test_abs_avg=10.404413223266602
production_forward grad[93] vs paper_forward: mean_abs=0.22833651304244995, max_abs=2.40625, mean_rel=0.11441121995449066, max_rel=543.587646484375, norm_rel=0.019357366487383842, ref_abs_avg=11.935012817382812, test_abs_avg=11.934690475463867
production_forward grad[94] vs paper_forward: mean_abs=0.2220788598060608, max_abs=2.125, mean_rel=0.11934096366167068, max_rel=639.3511352539062, norm_rel=0.019165320321917534, ref_abs_avg=11.768007278442383, test_abs_avg=11.772327423095703
production_forward grad[95] vs paper_forward: mean_abs=0.1831662654876709, max_abs=0.75, mean_rel=0.14942291378974915, max_rel=22.409549713134766, norm_rel=0.018331969156861305, ref_abs_avg=10.195978164672852, test_abs_avg=10.187297821044922
production_forward grad[96] vs paper_forward: mean_abs=0.22105544805526733, max_abs=2.5625, mean_rel=0.11153291165828705, max_rel=513.0774536132812, norm_rel=0.018901946023106575, ref_abs_avg=11.89712142944336, test_abs_avg=11.896942138671875
production_forward grad[97] vs paper_forward: mean_abs=0.2112610638141632, max_abs=2.15625, mean_rel=0.10723184049129486, max_rel=383.35009765625, norm_rel=0.018327239900827408, ref_abs_avg=11.719603538513184, test_abs_avg=11.715314865112305
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0015806471928954124, max_abs=0.037109375
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00819912925362587, max_abs=0.390625, mean_rel=0.07257911562919617, max_rel=162.43565368652344, norm_rel=0.019903261214494705, ref_abs_avg=0.4447036385536194, test_abs_avg=0.4447057843208313
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=4.943301200866699, max_abs=40.0, mean_rel=0.17298318445682526, max_rel=282.42498779296875, norm_rel=0.01962776854634285, ref_abs_avg=216.8657684326172, test_abs_avg=216.8795623779297
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.8383712768554688, max_abs=3.25, mean_rel=0.07536394894123077, max_rel=2.8585357666015625, norm_rel=0.023602597415447235, ref_abs_avg=34.70317840576172, test_abs_avg=34.68843078613281
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.0217881202697754, max_abs=7.484375, mean_rel=0.1680881530046463, max_rel=1558.64599609375, norm_rel=0.023399854078888893, ref_abs_avg=43.88035583496094, test_abs_avg=43.88062286376953
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=0.9975588321685791, max_abs=6.0625, mean_rel=0.15095829963684082, max_rel=894.7481079101562, norm_rel=0.023229416459798813, ref_abs_avg=43.18003463745117, test_abs_avg=43.18296813964844
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7441368103027344, max_abs=3.25, mean_rel=0.09155319631099701, max_rel=10.58423137664795, norm_rel=0.023704292252659798, ref_abs_avg=31.88287353515625, test_abs_avg=31.914913177490234
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9066362977027893, max_abs=6.0, mean_rel=0.16185227036476135, max_rel=1233.85205078125, norm_rel=0.02320203371345997, ref_abs_avg=39.260986328125, test_abs_avg=39.26393127441406
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.8857455253601074, max_abs=5.5, mean_rel=0.1696757674217224, max_rel=810.9600830078125, norm_rel=0.02301342412829399, ref_abs_avg=38.664772033691406, test_abs_avg=38.662879943847656
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.655775785446167, max_abs=2.5, mean_rel=0.12373344600200653, max_rel=10.809163093566895, norm_rel=0.02130594104528427, ref_abs_avg=30.479751586914062, test_abs_avg=30.449295043945312
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8206847906112671, max_abs=5.0, mean_rel=0.15484841167926788, max_rel=1105.1485595703125, norm_rel=0.022848661988973618, ref_abs_avg=36.08742904663086, test_abs_avg=36.088356018066406
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.7977787852287292, max_abs=4.625, mean_rel=0.1416020393371582, max_rel=472.2407531738281, norm_rel=0.022479647770524025, ref_abs_avg=35.6722412109375, test_abs_avg=35.67451477050781
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6223058700561523, max_abs=2.625, mean_rel=0.08978326618671417, max_rel=4.226760387420654, norm_rel=0.023575738072395325, ref_abs_avg=26.688491821289062, test_abs_avg=26.66905975341797
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.7484613656997681, max_abs=5.25, mean_rel=0.15844503045082092, max_rel=1702.68310546875, norm_rel=0.022746654227375984, ref_abs_avg=33.05670166015625, test_abs_avg=33.0590934753418
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7336564660072327, max_abs=4.5, mean_rel=0.15065309405326843, max_rel=809.9306030273438, norm_rel=0.02220752090215683, ref_abs_avg=33.180259704589844, test_abs_avg=33.18329620361328
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.5925788879394531, max_abs=2.125, mean_rel=0.05989313870668411, max_rel=2.7404701709747314, norm_rel=0.02176862582564354, ref_abs_avg=26.606555938720703, test_abs_avg=26.54474639892578
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7032508850097656, max_abs=4.6640625, mean_rel=0.1522420048713684, max_rel=1783.0101318359375, norm_rel=0.02227453514933586, ref_abs_avg=31.686622619628906, test_abs_avg=31.690061569213867
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.6871299743652344, max_abs=4.375, mean_rel=0.1471797227859497, max_rel=861.893798828125, norm_rel=0.022204633802175522, ref_abs_avg=31.149213790893555, test_abs_avg=31.15353012084961
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5276603698730469, max_abs=2.0, mean_rel=0.1075938493013382, max_rel=9.187932968139648, norm_rel=0.021831089630723, ref_abs_avg=23.859525680541992, test_abs_avg=23.871641159057617
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.6630702018737793, max_abs=4.25, mean_rel=0.1420125663280487, max_rel=1458.039306640625, norm_rel=0.022227736189961433, ref_abs_avg=29.939586639404297, test_abs_avg=29.94245719909668
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6495802402496338, max_abs=4.5, mean_rel=0.16084954142570496, max_rel=1084.9190673828125, norm_rel=0.021915193647146225, ref_abs_avg=29.788314819335938, test_abs_avg=29.79059600830078
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.49393701553344727, max_abs=2.375, mean_rel=0.0899893194437027, max_rel=5.403458595275879, norm_rel=0.022165881469845772, ref_abs_avg=23.034311294555664, test_abs_avg=23.063610076904297
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6317760348320007, max_abs=4.25, mean_rel=0.15624040365219116, max_rel=1101.3411865234375, norm_rel=0.02205662801861763, ref_abs_avg=28.780601501464844, test_abs_avg=28.779460906982422
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6112340688705444, max_abs=3.5, mean_rel=0.14985787868499756, max_rel=1019.7078857421875, norm_rel=0.02159479632973671, ref_abs_avg=28.469703674316406, test_abs_avg=28.467824935913086
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.47817838191986084, max_abs=2.25, mean_rel=0.14570489525794983, max_rel=29.07416534423828, norm_rel=0.022606754675507545, ref_abs_avg=21.19402313232422, test_abs_avg=21.210224151611328
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.594922661781311, max_abs=3.75, mean_rel=0.14984361827373505, max_rel=1344.4783935546875, norm_rel=0.021968528628349304, ref_abs_avg=27.148971557617188, test_abs_avg=27.15135383605957
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.5835756659507751, max_abs=3.25, mean_rel=0.15080654621124268, max_rel=912.0989379882812, norm_rel=0.02170756086707115, ref_abs_avg=26.999244689941406, test_abs_avg=26.99791717529297
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.5388288497924805, max_abs=2.625, mean_rel=0.11719288676977158, max_rel=17.436878204345703, norm_rel=0.024098938331007957, ref_abs_avg=22.786113739013672, test_abs_avg=22.817638397216797
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.6981242895126343, max_abs=4.875, mean_rel=0.17240817844867706, max_rel=1931.0328369140625, norm_rel=0.02403961680829525, ref_abs_avg=29.164714813232422, test_abs_avg=29.167417526245117
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.6925210952758789, max_abs=4.25, mean_rel=0.16109588742256165, max_rel=520.2732543945312, norm_rel=0.023886092007160187, ref_abs_avg=29.08936309814453, test_abs_avg=29.090028762817383
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.574406623840332, max_abs=2.25, mean_rel=0.1713334023952484, max_rel=15.216035842895508, norm_rel=0.02663014456629753, ref_abs_avg=21.488231658935547, test_abs_avg=21.47331428527832
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6476078033447266, max_abs=4.0, mean_rel=0.1596725434064865, max_rel=620.6475830078125, norm_rel=0.02433088794350624, ref_abs_avg=26.718082427978516, test_abs_avg=26.718786239624023
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6361831426620483, max_abs=4.0, mean_rel=0.15125125646591187, max_rel=800.9740600585938, norm_rel=0.024237139150500298, ref_abs_avg=26.350622177124023, test_abs_avg=26.349565505981445
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.494412899017334, max_abs=2.25, mean_rel=0.09401308000087738, max_rel=11.210420608520508, norm_rel=0.024777622893452644, ref_abs_avg=20.175966262817383, test_abs_avg=20.172882080078125
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.5977506041526794, max_abs=4.0, mean_rel=0.16687048971652985, max_rel=943.7811889648438, norm_rel=0.02406003326177597, ref_abs_avg=24.924352645874023, test_abs_avg=24.92416763305664
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.5917291641235352, max_abs=3.6875, mean_rel=0.16147905588150024, max_rel=1036.4962158203125, norm_rel=0.023828499019145966, ref_abs_avg=24.91260528564453, test_abs_avg=24.913074493408203
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.47057485580444336, max_abs=1.875, mean_rel=0.0935623049736023, max_rel=9.724124908447266, norm_rel=0.024629531428217888, ref_abs_avg=18.913673400878906, test_abs_avg=18.931976318359375
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.5589599609375, max_abs=3.75, mean_rel=0.15626415610313416, max_rel=1085.0733642578125, norm_rel=0.023781390860676765, ref_abs_avg=23.573896408081055, test_abs_avg=23.574905395507812
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5511370301246643, max_abs=3.5, mean_rel=0.1634598672389984, max_rel=1490.091552734375, norm_rel=0.023726128041744232, ref_abs_avg=23.306900024414062, test_abs_avg=23.311569213867188
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4584876298904419, max_abs=2.0, mean_rel=0.17748019099235535, max_rel=28.697519302368164, norm_rel=0.025971677154302597, ref_abs_avg=17.40027618408203, test_abs_avg=17.416460037231445
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5260047912597656, max_abs=3.4609375, mean_rel=0.14800947904586792, max_rel=636.0636596679688, norm_rel=0.023647036403417587, ref_abs_avg=22.31418800354004, test_abs_avg=22.313629150390625
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5146108269691467, max_abs=3.25, mean_rel=0.1482657790184021, max_rel=833.345703125, norm_rel=0.02323324605822563, ref_abs_avg=22.23807144165039, test_abs_avg=22.239208221435547
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.412821888923645, max_abs=1.625, mean_rel=0.09206415712833405, max_rel=10.208102226257324, norm_rel=0.022933082655072212, ref_abs_avg=17.811513900756836, test_abs_avg=17.878353118896484
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5000849366188049, max_abs=3.4375, mean_rel=0.14329850673675537, max_rel=533.3689575195312, norm_rel=0.023337479680776596, ref_abs_avg=21.48455238342285, test_abs_avg=21.483600616455078
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.4903619885444641, max_abs=3.5, mean_rel=0.15696600079536438, max_rel=1199.17578125, norm_rel=0.02320905774831772, ref_abs_avg=21.206384658813477, test_abs_avg=21.203815460205078
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.40655630826950073, max_abs=1.5, mean_rel=0.12022619694471359, max_rel=17.61159896850586, norm_rel=0.024380071088671684, ref_abs_avg=16.676340103149414, test_abs_avg=16.661712646484375
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.47225722670555115, max_abs=3.6875, mean_rel=0.14879579842090607, max_rel=869.0073852539062, norm_rel=0.02281859703361988, ref_abs_avg=20.70207977294922, test_abs_avg=20.70157814025879
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.4652159810066223, max_abs=2.875, mean_rel=0.1542154848575592, max_rel=1704.216796875, norm_rel=0.022878050804138184, ref_abs_avg=20.394086837768555, test_abs_avg=20.396324157714844
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.36805033683776855, max_abs=1.5625, mean_rel=0.17636466026306152, max_rel=45.20465087890625, norm_rel=0.023290464654564857, ref_abs_avg=15.986122131347656, test_abs_avg=15.986634254455566
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.4500538110733032, max_abs=3.01953125, mean_rel=0.15046696364879608, max_rel=712.9170532226562, norm_rel=0.02279778942465782, ref_abs_avg=19.749088287353516, test_abs_avg=19.74759292602539
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.442796528339386, max_abs=2.84375, mean_rel=0.1485917717218399, max_rel=709.9923706054688, norm_rel=0.02252063900232315, ref_abs_avg=19.713165283203125, test_abs_avg=19.709888458251953
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4490780830383301, max_abs=1.6875, mean_rel=0.09022517502307892, max_rel=2.5311412811279297, norm_rel=0.026598509401082993, ref_abs_avg=17.19921112060547, test_abs_avg=17.218673706054688
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5245841145515442, max_abs=3.5, mean_rel=0.15533088147640228, max_rel=842.6250610351562, norm_rel=0.02421105094254017, ref_abs_avg=21.70978355407715, test_abs_avg=21.712617874145508
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5178740620613098, max_abs=3.375, mean_rel=0.1623862087726593, max_rel=1189.5281982421875, norm_rel=0.024311751127243042, ref_abs_avg=21.367347717285156, test_abs_avg=21.369449615478516
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.3834652900695801, max_abs=1.375, mean_rel=0.11300354450941086, max_rel=10.41874885559082, norm_rel=0.021875452250242233, ref_abs_avg=17.45859146118164, test_abs_avg=17.45984649658203
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.48005416989326477, max_abs=3.375, mean_rel=0.1624627411365509, max_rel=905.1449584960938, norm_rel=0.02357836440205574, ref_abs_avg=20.36489486694336, test_abs_avg=20.36676025390625
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.46784406900405884, max_abs=3.3828125, mean_rel=0.14767152070999146, max_rel=693.33349609375, norm_rel=0.023591794073581696, ref_abs_avg=19.92876434326172, test_abs_avg=19.932331085205078
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.3598175048828125, max_abs=1.25, mean_rel=0.1416608989238739, max_rel=18.043004989624023, norm_rel=0.02254381775856018, ref_abs_avg=15.82847785949707, test_abs_avg=15.850442886352539
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.4429142475128174, max_abs=2.875, mean_rel=0.15080639719963074, max_rel=945.4727783203125, norm_rel=0.02317938767373562, ref_abs_avg=19.125770568847656, test_abs_avg=19.12630844116211
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.43767696619033813, max_abs=3.5, mean_rel=0.14831650257110596, max_rel=498.2485046386719, norm_rel=0.023102404549717903, ref_abs_avg=18.968307495117188, test_abs_avg=18.97088623046875
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.32656335830688477, max_abs=1.5625, mean_rel=0.12623244524002075, max_rel=16.88499641418457, norm_rel=0.02150643803179264, ref_abs_avg=15.395845413208008, test_abs_avg=15.390146255493164
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.41465556621551514, max_abs=2.75, mean_rel=0.14207346737384796, max_rel=778.028076171875, norm_rel=0.022898107767105103, ref_abs_avg=18.09425163269043, test_abs_avg=18.094242095947266
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.40670210123062134, max_abs=3.07421875, mean_rel=0.14283546805381775, max_rel=562.9949340820312, norm_rel=0.02269255369901657, ref_abs_avg=17.94189453125, test_abs_avg=17.941852569580078
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.30865269899368286, max_abs=1.375, mean_rel=0.07562239468097687, max_rel=2.030233144760132, norm_rel=0.02272450551390648, ref_abs_avg=13.747462272644043, test_abs_avg=13.73315715789795
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.3868889808654785, max_abs=3.15234375, mean_rel=0.14568956196308136, max_rel=701.7684936523438, norm_rel=0.022404314950108528, ref_abs_avg=17.265810012817383, test_abs_avg=17.2645206451416
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.38049086928367615, max_abs=2.625, mean_rel=0.14303793013095856, max_rel=633.526123046875, norm_rel=0.022434666752815247, ref_abs_avg=16.966121673583984, test_abs_avg=16.968685150146484
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.29070615768432617, max_abs=1.125, mean_rel=0.05789894983172417, max_rel=1.4920028448104858, norm_rel=0.02055680751800537, ref_abs_avg=14.296320915222168, test_abs_avg=14.281272888183594
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.3601204752922058, max_abs=2.625, mean_rel=0.14082123339176178, max_rel=983.8833618164062, norm_rel=0.0218051727861166, ref_abs_avg=16.532445907592773, test_abs_avg=16.532705307006836
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.35738950967788696, max_abs=2.625, mean_rel=0.1325020045042038, max_rel=674.78466796875, norm_rel=0.021648617461323738, ref_abs_avg=16.526321411132812, test_abs_avg=16.526582717895508
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.2809520959854126, max_abs=1.375, mean_rel=0.15784068405628204, max_rel=25.254831314086914, norm_rel=0.020807942375540733, ref_abs_avg=13.685275077819824, test_abs_avg=13.666692733764648
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.3471747040748596, max_abs=2.5, mean_rel=0.142887681722641, max_rel=661.8541870117188, norm_rel=0.021452277898788452, ref_abs_avg=16.187175750732422, test_abs_avg=16.187549591064453
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3425019085407257, max_abs=2.875, mean_rel=0.13535207509994507, max_rel=566.0729370117188, norm_rel=0.02124316804111004, ref_abs_avg=16.140270233154297, test_abs_avg=16.14542007446289
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.26238059997558594, max_abs=1.0234375, mean_rel=0.08043570816516876, max_rel=3.1517271995544434, norm_rel=0.020245127379894257, ref_abs_avg=13.407135009765625, test_abs_avg=13.401548385620117
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3312082290649414, max_abs=3.0, mean_rel=0.13670185208320618, max_rel=620.9959716796875, norm_rel=0.02111223340034485, ref_abs_avg=15.67212200164795, test_abs_avg=15.671228408813477
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3243078589439392, max_abs=2.375, mean_rel=0.12146088480949402, max_rel=460.250244140625, norm_rel=0.021223364397883415, ref_abs_avg=15.316255569458008, test_abs_avg=15.314027786254883
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.2860116958618164, max_abs=1.1259765625, mean_rel=0.12882153689861298, max_rel=14.05686092376709, norm_rel=0.02441989816725254, ref_abs_avg=11.973441123962402, test_abs_avg=11.947861671447754
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.3548414707183838, max_abs=2.5, mean_rel=0.14382986724376678, max_rel=636.9631958007812, norm_rel=0.02341289632022381, ref_abs_avg=15.17931842803955, test_abs_avg=15.17811107635498
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3507559299468994, max_abs=2.625, mean_rel=0.14744800329208374, max_rel=882.8551635742188, norm_rel=0.02314242534339428, ref_abs_avg=15.19411849975586, test_abs_avg=15.190210342407227
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.26028943061828613, max_abs=1.03125, mean_rel=0.06738170236349106, max_rel=2.0625627040863037, norm_rel=0.022461464628577232, ref_abs_avg=11.523183822631836, test_abs_avg=11.526483535766602
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3189670145511627, max_abs=2.328125, mean_rel=0.14163954555988312, max_rel=1024.9090576171875, norm_rel=0.02244400605559349, ref_abs_avg=14.199612617492676, test_abs_avg=14.19999885559082
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.3182453513145447, max_abs=2.5, mean_rel=0.1357457935810089, max_rel=475.0716552734375, norm_rel=0.022442487999796867, ref_abs_avg=14.157151222229004, test_abs_avg=14.15178108215332
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.25278663635253906, max_abs=0.859375, mean_rel=0.05568375065922737, max_rel=1.854677438735962, norm_rel=0.02135472744703293, ref_abs_avg=12.09807014465332, test_abs_avg=12.088644027709961
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.29774609208106995, max_abs=2.75, mean_rel=0.1373245120048523, max_rel=790.5892333984375, norm_rel=0.021641971543431282, ref_abs_avg=13.736690521240234, test_abs_avg=13.737117767333984
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.28847163915634155, max_abs=2.125, mean_rel=0.14070072770118713, max_rel=1254.3995361328125, norm_rel=0.021319732069969177, ref_abs_avg=13.546258926391602, test_abs_avg=13.542560577392578
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.24848006665706635, max_abs=1.0, mean_rel=0.16451722383499146, max_rel=38.55118942260742, norm_rel=0.02186957746744156, ref_abs_avg=11.382596969604492, test_abs_avg=11.376463890075684
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.2811318039894104, max_abs=2.4375, mean_rel=0.13389694690704346, max_rel=529.7044067382812, norm_rel=0.021015668287873268, ref_abs_avg=13.396707534790039, test_abs_avg=13.397109031677246
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.2722667157649994, max_abs=2.75, mean_rel=0.12677344679832458, max_rel=444.1627502441406, norm_rel=0.020634885877370834, ref_abs_avg=13.223886489868164, test_abs_avg=13.227045059204102
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.2171182632446289, max_abs=0.8125, mean_rel=0.06892046332359314, max_rel=2.1912283897399902, norm_rel=0.019888633862137794, ref_abs_avg=10.970518112182617, test_abs_avg=10.978355407714844
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.26156383752822876, max_abs=2.6875, mean_rel=0.12455693632364273, max_rel=705.8653564453125, norm_rel=0.020367255434393883, ref_abs_avg=12.884490013122559, test_abs_avg=12.88457202911377
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.25574225187301636, max_abs=2.0, mean_rel=0.12216893583536148, max_rel=384.3865966796875, norm_rel=0.01994980312883854, ref_abs_avg=12.873274803161621, test_abs_avg=12.868545532226562
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.19950532913208008, max_abs=0.974609375, mean_rel=0.17029257118701935, max_rel=43.346073150634766, norm_rel=0.01950000412762165, ref_abs_avg=10.284322738647461, test_abs_avg=10.298681259155273
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.24393923580646515, max_abs=2.8125, mean_rel=0.12004223465919495, max_rel=575.4113159179688, norm_rel=0.020085079595446587, ref_abs_avg=12.222588539123535, test_abs_avg=12.221334457397461
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.23844707012176514, max_abs=2.125, mean_rel=0.12380777299404144, max_rel=463.61956787109375, norm_rel=0.019737087190151215, ref_abs_avg=12.168198585510254, test_abs_avg=12.170801162719727
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.19744396209716797, max_abs=0.9375, mean_rel=0.06762188673019409, max_rel=3.764648199081421, norm_rel=0.01913771964609623, ref_abs_avg=10.403371810913086, test_abs_avg=10.396014213562012
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.22911185026168823, max_abs=2.375, mean_rel=0.11237368732690811, max_rel=463.7640075683594, norm_rel=0.01943019963800907, ref_abs_avg=11.935012817382812, test_abs_avg=11.93496322631836
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.2251167595386505, max_abs=2.375, mean_rel=0.12091383337974548, max_rel=597.853759765625, norm_rel=0.019420087337493896, ref_abs_avg=11.768007278442383, test_abs_avg=11.771848678588867
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.18470144271850586, max_abs=0.75, mean_rel=0.18209189176559448, max_rel=37.60813522338867, norm_rel=0.018460489809513092, ref_abs_avg=10.195978164672852, test_abs_avg=10.185015678405762
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.22125555574893951, max_abs=2.4375, mean_rel=0.11129133403301239, max_rel=418.99591064453125, norm_rel=0.01892671175301075, ref_abs_avg=11.89712142944336, test_abs_avg=11.896618843078613
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.2135562151670456, max_abs=2.125, mean_rel=0.10862430930137634, max_rel=458.0165710449219, norm_rel=0.018498539924621582, ref_abs_avg=11.719603538513184, test_abs_avg=11.714717864990234
liger_forward vs paper_forward output: mean_abs=0.00014236735296435654, max_abs=0.0244140625
liger_forward grad[0] vs paper_forward: mean_abs=0.0034166378900408745, max_abs=0.19921875, mean_rel=0.025306615978479385, max_rel=68.69664001464844, norm_rel=0.009559211321175098, ref_abs_avg=0.4447036385536194, test_abs_avg=0.44467824697494507
liger_forward grad[1] vs paper_forward: mean_abs=1.4637051820755005, max_abs=16.0, mean_rel=0.09041867405176163, max_rel=618.9548950195312, norm_rel=0.0062395865097641945, ref_abs_avg=216.8657684326172, test_abs_avg=216.8748779296875
liger_forward grad[2] vs paper_forward: mean_abs=0.3177156448364258, max_abs=1.0, mean_rel=0.030378367751836777, max_rel=1.9297809600830078, norm_rel=0.00924695748835802, ref_abs_avg=34.70317840576172, test_abs_avg=34.695289611816406
liger_forward grad[3] vs paper_forward: mean_abs=0.3692552447319031, max_abs=2.5, mean_rel=0.06164418160915375, max_rel=654.7909545898438, norm_rel=0.008739596232771873, ref_abs_avg=43.88035583496094, test_abs_avg=43.87939453125
liger_forward grad[4] vs paper_forward: mean_abs=0.3551143407821655, max_abs=2.25, mean_rel=0.0561012364923954, max_rel=380.39013671875, norm_rel=0.008549737744033337, ref_abs_avg=43.18003463745117, test_abs_avg=43.180294036865234
liger_forward grad[5] vs paper_forward: mean_abs=0.28074169158935547, max_abs=1.25, mean_rel=0.027179379016160965, max_rel=1.4114151000976562, norm_rel=0.008891750127077103, ref_abs_avg=31.88287353515625, test_abs_avg=31.877323150634766
liger_forward grad[6] vs paper_forward: mean_abs=0.3214101195335388, max_abs=2.0, mean_rel=0.058973778039216995, max_rel=431.0144348144531, norm_rel=0.008504506200551987, ref_abs_avg=39.260986328125, test_abs_avg=39.261619567871094
liger_forward grad[7] vs paper_forward: mean_abs=0.30827730894088745, max_abs=2.0, mean_rel=0.05503515899181366, max_rel=303.4229431152344, norm_rel=0.00831556785851717, ref_abs_avg=38.664772033691406, test_abs_avg=38.66462707519531
liger_forward grad[8] vs paper_forward: mean_abs=0.2355058193206787, max_abs=1.0, mean_rel=0.1376439779996872, max_rel=37.15143585205078, norm_rel=0.008069789968430996, ref_abs_avg=30.479751586914062, test_abs_avg=30.492298126220703
liger_forward grad[9] vs paper_forward: mean_abs=0.28799253702163696, max_abs=2.0, mean_rel=0.05626006796956062, max_rel=541.3232421875, norm_rel=0.008308134973049164, ref_abs_avg=36.08742904663086, test_abs_avg=36.08729553222656
liger_forward grad[10] vs paper_forward: mean_abs=0.274757444858551, max_abs=2.0, mean_rel=0.05259145796298981, max_rel=292.8762512207031, norm_rel=0.00805109366774559, ref_abs_avg=35.6722412109375, test_abs_avg=35.672515869140625
liger_forward grad[11] vs paper_forward: mean_abs=0.2162197232246399, max_abs=1.0, mean_rel=0.031839385628700256, max_rel=1.9237065315246582, norm_rel=0.008452885784208775, ref_abs_avg=26.688491821289062, test_abs_avg=26.67150115966797
liger_forward grad[12] vs paper_forward: mean_abs=0.25934749841690063, max_abs=2.0, mean_rel=0.054116666316986084, max_rel=607.0494384765625, norm_rel=0.008185331709682941, ref_abs_avg=33.05670166015625, test_abs_avg=33.05638885498047
liger_forward grad[13] vs paper_forward: mean_abs=0.2497740089893341, max_abs=1.625, mean_rel=0.04860733449459076, max_rel=254.92562866210938, norm_rel=0.007885695435106754, ref_abs_avg=33.180259704589844, test_abs_avg=33.179222106933594
liger_forward grad[14] vs paper_forward: mean_abs=0.1996002197265625, max_abs=1.0, mean_rel=0.021291295066475868, max_rel=0.7077490091323853, norm_rel=0.007978669367730618, ref_abs_avg=26.606555938720703, test_abs_avg=26.595417022705078
liger_forward grad[15] vs paper_forward: mean_abs=0.24091961979866028, max_abs=1.5, mean_rel=0.048861026763916016, max_rel=333.41790771484375, norm_rel=0.0079487394541502, ref_abs_avg=31.686622619628906, test_abs_avg=31.686321258544922
liger_forward grad[16] vs paper_forward: mean_abs=0.23287495970726013, max_abs=1.5, mean_rel=0.04635397717356682, max_rel=123.74832916259766, norm_rel=0.007831052877008915, ref_abs_avg=31.149213790893555, test_abs_avg=31.14858627319336
liger_forward grad[17] vs paper_forward: mean_abs=0.19213294982910156, max_abs=0.75, mean_rel=0.05788200721144676, max_rel=7.525990962982178, norm_rel=0.00820049736648798, ref_abs_avg=23.859525680541992, test_abs_avg=23.854618072509766
liger_forward grad[18] vs paper_forward: mean_abs=0.2248215526342392, max_abs=1.5, mean_rel=0.0488288439810276, max_rel=551.8042602539062, norm_rel=0.007855492644011974, ref_abs_avg=29.939586639404297, test_abs_avg=29.939268112182617
liger_forward grad[19] vs paper_forward: mean_abs=0.21757280826568604, max_abs=1.5, mean_rel=0.055369503796100616, max_rel=349.75067138671875, norm_rel=0.007669602055102587, ref_abs_avg=29.788314819335938, test_abs_avg=29.78761863708496
liger_forward grad[20] vs paper_forward: mean_abs=0.16166448593139648, max_abs=0.75, mean_rel=0.029199983924627304, max_rel=1.100986361503601, norm_rel=0.0075117130763828754, ref_abs_avg=23.034311294555664, test_abs_avg=23.03873634338379
liger_forward grad[21] vs paper_forward: mean_abs=0.21203342080116272, max_abs=1.5, mean_rel=0.05436556041240692, max_rel=555.85595703125, norm_rel=0.007728599477559328, ref_abs_avg=28.780601501464844, test_abs_avg=28.779945373535156
liger_forward grad[22] vs paper_forward: mean_abs=0.20279017090797424, max_abs=1.2734375, mean_rel=0.0497843474149704, max_rel=438.8130187988281, norm_rel=0.00750600453466177, ref_abs_avg=28.469703674316406, test_abs_avg=28.469036102294922
liger_forward grad[23] vs paper_forward: mean_abs=0.16166523098945618, max_abs=0.75, mean_rel=0.05684252083301544, max_rel=13.778453826904297, norm_rel=0.008255407214164734, ref_abs_avg=21.19402313232422, test_abs_avg=21.18104362487793
liger_forward grad[24] vs paper_forward: mean_abs=0.1980055570602417, max_abs=1.25, mean_rel=0.04851093515753746, max_rel=528.339599609375, norm_rel=0.007649035658687353, ref_abs_avg=27.148971557617188, test_abs_avg=27.14924430847168
liger_forward grad[25] vs paper_forward: mean_abs=0.19078868627548218, max_abs=1.25, mean_rel=0.04821097105741501, max_rel=336.2471618652344, norm_rel=0.007451352197676897, ref_abs_avg=26.999244689941406, test_abs_avg=26.998321533203125
liger_forward grad[26] vs paper_forward: mean_abs=0.18103981018066406, max_abs=0.703125, mean_rel=0.03455036133527756, max_rel=3.041814088821411, norm_rel=0.008216407150030136, ref_abs_avg=22.786113739013672, test_abs_avg=22.782785415649414
liger_forward grad[27] vs paper_forward: mean_abs=0.2209627330303192, max_abs=1.5, mean_rel=0.0502861887216568, max_rel=378.0577392578125, norm_rel=0.007934970781207085, ref_abs_avg=29.164714813232422, test_abs_avg=29.16400909423828
liger_forward grad[28] vs paper_forward: mean_abs=0.2150239646434784, max_abs=1.5, mean_rel=0.05600874498486519, max_rel=240.43536376953125, norm_rel=0.0077656288631260395, ref_abs_avg=29.08936309814453, test_abs_avg=29.089651107788086
liger_forward grad[29] vs paper_forward: mean_abs=0.16971826553344727, max_abs=0.8125, mean_rel=0.03887927159667015, max_rel=3.5054008960723877, norm_rel=0.008225400932133198, ref_abs_avg=21.488231658935547, test_abs_avg=21.487625122070312
liger_forward grad[30] vs paper_forward: mean_abs=0.1961236298084259, max_abs=1.25, mean_rel=0.049375563859939575, max_rel=398.3594055175781, norm_rel=0.007696045562624931, ref_abs_avg=26.718082427978516, test_abs_avg=26.7179012298584
liger_forward grad[31] vs paper_forward: mean_abs=0.18886148929595947, max_abs=1.5, mean_rel=0.04500054568052292, max_rel=236.62001037597656, norm_rel=0.0075378078036010265, ref_abs_avg=26.350622177124023, test_abs_avg=26.350238800048828
liger_forward grad[32] vs paper_forward: mean_abs=0.13680028915405273, max_abs=0.625, mean_rel=0.021068451926112175, max_rel=0.9506585597991943, norm_rel=0.007479983381927013, ref_abs_avg=20.175966262817383, test_abs_avg=20.174949645996094
liger_forward grad[33] vs paper_forward: mean_abs=0.17772802710533142, max_abs=1.171875, mean_rel=0.04783283546566963, max_rel=380.0035705566406, norm_rel=0.007485043257474899, ref_abs_avg=24.924352645874023, test_abs_avg=24.92407989501953
liger_forward grad[34] vs paper_forward: mean_abs=0.1719399094581604, max_abs=1.125, mean_rel=0.04807380959391594, max_rel=363.88214111328125, norm_rel=0.007275618612766266, ref_abs_avg=24.91260528564453, test_abs_avg=24.91284942626953
liger_forward grad[35] vs paper_forward: mean_abs=0.13155889511108398, max_abs=0.625, mean_rel=0.02523375302553177, max_rel=3.395988941192627, norm_rel=0.007399467751383781, ref_abs_avg=18.913673400878906, test_abs_avg=18.918907165527344
liger_forward grad[36] vs paper_forward: mean_abs=0.16427955031394958, max_abs=1.0, mean_rel=0.04588836431503296, max_rel=307.42523193359375, norm_rel=0.007333024404942989, ref_abs_avg=23.573896408081055, test_abs_avg=23.57384490966797
liger_forward grad[37] vs paper_forward: mean_abs=0.15807785093784332, max_abs=1.0, mean_rel=0.04681817442178726, max_rel=286.86309814453125, norm_rel=0.007169337011873722, ref_abs_avg=23.306900024414062, test_abs_avg=23.30652618408203
liger_forward grad[38] vs paper_forward: mean_abs=0.1292632818222046, max_abs=0.5, mean_rel=0.07013268023729324, max_rel=20.891300201416016, norm_rel=0.007632562890648842, ref_abs_avg=17.40027618408203, test_abs_avg=17.401805877685547
liger_forward grad[39] vs paper_forward: mean_abs=0.15338410437107086, max_abs=1.125, mean_rel=0.044322848320007324, max_rel=399.481689453125, norm_rel=0.007241363637149334, ref_abs_avg=22.31418800354004, test_abs_avg=22.313751220703125
liger_forward grad[40] vs paper_forward: mean_abs=0.14698022603988647, max_abs=1.0, mean_rel=0.041532859206199646, max_rel=212.84909057617188, norm_rel=0.007010144181549549, ref_abs_avg=22.23807144165039, test_abs_avg=22.236812591552734
liger_forward grad[41] vs paper_forward: mean_abs=0.12454032897949219, max_abs=0.515625, mean_rel=0.03168933466076851, max_rel=3.7724008560180664, norm_rel=0.007585496176034212, ref_abs_avg=17.811513900756836, test_abs_avg=17.822629928588867
liger_forward grad[42] vs paper_forward: mean_abs=0.14482247829437256, max_abs=1.015625, mean_rel=0.04215586185455322, max_rel=152.11639404296875, norm_rel=0.007121798116713762, ref_abs_avg=21.48455238342285, test_abs_avg=21.484764099121094
liger_forward grad[43] vs paper_forward: mean_abs=0.1396520733833313, max_abs=1.0, mean_rel=0.04426299035549164, max_rel=177.1648406982422, norm_rel=0.006966848392039537, ref_abs_avg=21.206384658813477, test_abs_avg=21.205947875976562
liger_forward grad[44] vs paper_forward: mean_abs=0.11829423904418945, max_abs=0.51953125, mean_rel=0.03666190803050995, max_rel=5.457891941070557, norm_rel=0.007535594515502453, ref_abs_avg=16.676340103149414, test_abs_avg=16.662824630737305
liger_forward grad[45] vs paper_forward: mean_abs=0.13600459694862366, max_abs=1.0, mean_rel=0.04243091866374016, max_rel=463.9069519042969, norm_rel=0.0069434489123523235, ref_abs_avg=20.70207977294922, test_abs_avg=20.701684951782227
liger_forward grad[46] vs paper_forward: mean_abs=0.13069292902946472, max_abs=1.0, mean_rel=0.04336009547114372, max_rel=222.8235321044922, norm_rel=0.0068073938600718975, ref_abs_avg=20.394086837768555, test_abs_avg=20.393869400024414
liger_forward grad[47] vs paper_forward: mean_abs=0.10842204093933105, max_abs=0.5, mean_rel=0.03664984181523323, max_rel=3.0851147174835205, norm_rel=0.007196689955890179, ref_abs_avg=15.986122131347656, test_abs_avg=15.980810165405273
liger_forward grad[48] vs paper_forward: mean_abs=0.12849412858486176, max_abs=1.0, mean_rel=0.042402081191539764, max_rel=207.2374267578125, norm_rel=0.00688545685261488, ref_abs_avg=19.749088287353516, test_abs_avg=19.749019622802734
liger_forward grad[49] vs paper_forward: mean_abs=0.12395215034484863, max_abs=0.875, mean_rel=0.03948182240128517, max_rel=190.42710876464844, norm_rel=0.006690083537250757, ref_abs_avg=19.713165283203125, test_abs_avg=19.714012145996094
liger_forward grad[50] vs paper_forward: mean_abs=0.12962579727172852, max_abs=0.5, mean_rel=0.027993056923151016, max_rel=3.3639025688171387, norm_rel=0.007823011837899685, ref_abs_avg=17.19921112060547, test_abs_avg=17.199424743652344
liger_forward grad[51] vs paper_forward: mean_abs=0.15084198117256165, max_abs=1.25, mean_rel=0.04450220242142677, max_rel=242.4140167236328, norm_rel=0.007316115777939558, ref_abs_avg=21.70978355407715, test_abs_avg=21.70956039428711
liger_forward grad[52] vs paper_forward: mean_abs=0.14451584219932556, max_abs=1.0, mean_rel=0.04716016724705696, max_rel=223.79620361328125, norm_rel=0.007156457286328077, ref_abs_avg=21.367347717285156, test_abs_avg=21.36676025390625
liger_forward grad[53] vs paper_forward: mean_abs=0.11092519760131836, max_abs=0.5, mean_rel=0.031492460519075394, max_rel=2.348590850830078, norm_rel=0.006825590506196022, ref_abs_avg=17.45859146118164, test_abs_avg=17.45669174194336
liger_forward grad[54] vs paper_forward: mean_abs=0.13516399264335632, max_abs=1.0, mean_rel=0.04399549216032028, max_rel=298.54864501953125, norm_rel=0.007003572769463062, ref_abs_avg=20.36489486694336, test_abs_avg=20.36520767211914
liger_forward grad[55] vs paper_forward: mean_abs=0.1307186782360077, max_abs=1.0, mean_rel=0.042553309351205826, max_rel=205.84136962890625, norm_rel=0.006949183065444231, ref_abs_avg=19.92876434326172, test_abs_avg=19.928342819213867
liger_forward grad[56] vs paper_forward: mean_abs=0.10464340448379517, max_abs=0.4375, mean_rel=0.03827808052301407, max_rel=2.560675621032715, norm_rel=0.006859110202640295, ref_abs_avg=15.82847785949707, test_abs_avg=15.82461166381836
liger_forward grad[57] vs paper_forward: mean_abs=0.1255103498697281, max_abs=1.0, mean_rel=0.04215404763817787, max_rel=246.52667236328125, norm_rel=0.006937522906810045, ref_abs_avg=19.125770568847656, test_abs_avg=19.12531280517578
liger_forward grad[58] vs paper_forward: mean_abs=0.12106664478778839, max_abs=1.03125, mean_rel=0.04114319384098053, max_rel=173.5596160888672, norm_rel=0.006777754053473473, ref_abs_avg=18.968307495117188, test_abs_avg=18.96810531616211
liger_forward grad[59] vs paper_forward: mean_abs=0.09857940673828125, max_abs=0.375, mean_rel=0.02932238020002842, max_rel=2.4987242221832275, norm_rel=0.0069000013172626495, ref_abs_avg=15.395845413208008, test_abs_avg=15.393587112426758
liger_forward grad[60] vs paper_forward: mean_abs=0.11568319797515869, max_abs=1.0, mean_rel=0.039931632578372955, max_rel=193.23194885253906, norm_rel=0.006768323481082916, ref_abs_avg=18.09425163269043, test_abs_avg=18.09369659423828
liger_forward grad[61] vs paper_forward: mean_abs=0.11199702322483063, max_abs=1.0, mean_rel=0.03966718167066574, max_rel=185.86070251464844, norm_rel=0.006641014479100704, ref_abs_avg=17.94189453125, test_abs_avg=17.94101333618164
liger_forward grad[62] vs paper_forward: mean_abs=0.09366798400878906, max_abs=0.375, mean_rel=0.027706600725650787, max_rel=1.2798963785171509, norm_rel=0.007026646751910448, ref_abs_avg=13.747462272644043, test_abs_avg=13.744257926940918
liger_forward grad[63] vs paper_forward: mean_abs=0.10890393704175949, max_abs=1.0, mean_rel=0.041089728474617004, max_rel=255.30528259277344, norm_rel=0.006681529339402914, ref_abs_avg=17.265810012817383, test_abs_avg=17.265247344970703
liger_forward grad[64] vs paper_forward: mean_abs=0.10543350875377655, max_abs=0.75, mean_rel=0.0389895886182785, max_rel=137.7017364501953, norm_rel=0.006604520604014397, ref_abs_avg=16.966121673583984, test_abs_avg=16.965803146362305
liger_forward grad[65] vs paper_forward: mean_abs=0.08472633361816406, max_abs=0.375, mean_rel=0.020781265571713448, max_rel=2.2081642150878906, norm_rel=0.006371405441313982, ref_abs_avg=14.296320915222168, test_abs_avg=14.304116249084473
liger_forward grad[66] vs paper_forward: mean_abs=0.10159222781658173, max_abs=0.75, mean_rel=0.038408808410167694, max_rel=277.7041931152344, norm_rel=0.006538209971040487, ref_abs_avg=16.532445907592773, test_abs_avg=16.53177833557129
liger_forward grad[67] vs paper_forward: mean_abs=0.09838514775037766, max_abs=1.0, mean_rel=0.03754739463329315, max_rel=239.98524475097656, norm_rel=0.006370138842612505, ref_abs_avg=16.526321411132812, test_abs_avg=16.527101516723633
liger_forward grad[68] vs paper_forward: mean_abs=0.08028458058834076, max_abs=0.375, mean_rel=0.02807311713695526, max_rel=1.8761980533599854, norm_rel=0.00629953108727932, ref_abs_avg=13.685275077819824, test_abs_avg=13.686765670776367
liger_forward grad[69] vs paper_forward: mean_abs=0.09704486280679703, max_abs=0.75, mean_rel=0.04047376662492752, max_rel=163.4187469482422, norm_rel=0.006390336435288191, ref_abs_avg=16.187175750732422, test_abs_avg=16.186933517456055
liger_forward grad[70] vs paper_forward: mean_abs=0.09416486322879791, max_abs=0.75, mean_rel=0.03653741627931595, max_rel=115.15149688720703, norm_rel=0.006248221267014742, ref_abs_avg=16.140270233154297, test_abs_avg=16.13957977294922
liger_forward grad[71] vs paper_forward: mean_abs=0.07740497589111328, max_abs=0.375, mean_rel=0.027516711503267288, max_rel=2.240201234817505, norm_rel=0.006343061104416847, ref_abs_avg=13.407135009765625, test_abs_avg=13.398587226867676
liger_forward grad[72] vs paper_forward: mean_abs=0.09223145246505737, max_abs=1.0, mean_rel=0.0383853055536747, max_rel=156.3835906982422, norm_rel=0.00628025783225894, ref_abs_avg=15.67212200164795, test_abs_avg=15.672083854675293
liger_forward grad[73] vs paper_forward: mean_abs=0.08890457451343536, max_abs=0.75, mean_rel=0.034451477229595184, max_rel=80.36756134033203, norm_rel=0.006224159616976976, ref_abs_avg=15.316255569458008, test_abs_avg=15.315962791442871
liger_forward grad[74] vs paper_forward: mean_abs=0.08711767196655273, max_abs=0.375, mean_rel=0.034663114696741104, max_rel=1.673290729522705, norm_rel=0.007680319715291262, ref_abs_avg=11.973441123962402, test_abs_avg=11.976028442382812
liger_forward grad[75] vs paper_forward: mean_abs=0.10472843050956726, max_abs=0.875, mean_rel=0.0438595712184906, max_rel=206.1462860107422, norm_rel=0.007265984546393156, ref_abs_avg=15.17931842803955, test_abs_avg=15.178433418273926
liger_forward grad[76] vs paper_forward: mean_abs=0.10255834460258484, max_abs=0.875, mean_rel=0.043197151273489, max_rel=184.32028198242188, norm_rel=0.007131977938115597, ref_abs_avg=15.19411849975586, test_abs_avg=15.193902015686035
liger_forward grad[77] vs paper_forward: mean_abs=0.08028674125671387, max_abs=0.32421875, mean_rel=0.022990984842181206, max_rel=0.7479503154754639, norm_rel=0.007400790695101023, ref_abs_avg=11.523183822631836, test_abs_avg=11.523059844970703
liger_forward grad[78] vs paper_forward: mean_abs=0.093487448990345, max_abs=1.0, mean_rel=0.04013717174530029, max_rel=205.8422088623047, norm_rel=0.006942796986550093, ref_abs_avg=14.199612617492676, test_abs_avg=14.199328422546387
liger_forward grad[79] vs paper_forward: mean_abs=0.09132177382707596, max_abs=1.0, mean_rel=0.03944796323776245, max_rel=236.65914916992188, norm_rel=0.006819374859333038, ref_abs_avg=14.157151222229004, test_abs_avg=14.156643867492676
liger_forward grad[80] vs paper_forward: mean_abs=0.06602048873901367, max_abs=0.25, mean_rel=0.017331484705209732, max_rel=1.7198516130447388, norm_rel=0.0061073401011526585, ref_abs_avg=12.09807014465332, test_abs_avg=12.093984603881836
liger_forward grad[81] vs paper_forward: mean_abs=0.0866629034280777, max_abs=0.75, mean_rel=0.04067126661539078, max_rel=206.16224670410156, norm_rel=0.006674525793641806, ref_abs_avg=13.736690521240234, test_abs_avg=13.735967636108398
liger_forward grad[82] vs paper_forward: mean_abs=0.08371034264564514, max_abs=0.75, mean_rel=0.039576198905706406, max_rel=126.85481262207031, norm_rel=0.0065887365490198135, ref_abs_avg=13.546258926391602, test_abs_avg=13.546772003173828
liger_forward grad[83] vs paper_forward: mean_abs=0.06509019434452057, max_abs=0.3125, mean_rel=0.041439175605773926, max_rel=8.40560245513916, norm_rel=0.0063506620936095715, ref_abs_avg=11.382596969604492, test_abs_avg=11.382498741149902
liger_forward grad[84] vs paper_forward: mean_abs=0.08119437843561172, max_abs=0.75, mean_rel=0.038023170083761215, max_rel=209.41220092773438, norm_rel=0.006453330162912607, ref_abs_avg=13.396707534790039, test_abs_avg=13.396329879760742
liger_forward grad[85] vs paper_forward: mean_abs=0.07806778699159622, max_abs=1.0, mean_rel=0.0359434112906456, max_rel=146.87838745117188, norm_rel=0.00631676334887743, ref_abs_avg=13.223886489868164, test_abs_avg=13.223608016967773
liger_forward grad[86] vs paper_forward: mean_abs=0.0652230978012085, max_abs=0.25, mean_rel=0.01957586407661438, max_rel=0.8324005603790283, norm_rel=0.0064265914261341095, ref_abs_avg=10.970518112182617, test_abs_avg=10.9725980758667
liger_forward grad[87] vs paper_forward: mean_abs=0.07542762160301208, max_abs=1.0, mean_rel=0.03610867261886597, max_rel=179.8629608154297, norm_rel=0.0062787593342363834, ref_abs_avg=12.884490013122559, test_abs_avg=12.884161949157715
liger_forward grad[88] vs paper_forward: mean_abs=0.07197311520576477, max_abs=1.0, mean_rel=0.03624610975384712, max_rel=196.35333251953125, norm_rel=0.006017296575009823, ref_abs_avg=12.873274803161621, test_abs_avg=12.87343692779541
liger_forward grad[89] vs paper_forward: mean_abs=0.062499046325683594, max_abs=0.25, mean_rel=0.031178977340459824, max_rel=2.77970814704895, norm_rel=0.006429600995033979, ref_abs_avg=10.284322738647461, test_abs_avg=10.280924797058105
liger_forward grad[90] vs paper_forward: mean_abs=0.0697970762848854, max_abs=1.0, mean_rel=0.03587964549660683, max_rel=213.501953125, norm_rel=0.006169992033392191, ref_abs_avg=12.222588539123535, test_abs_avg=12.222110748291016
liger_forward grad[91] vs paper_forward: mean_abs=0.06874285638332367, max_abs=0.75, mean_rel=0.03545273840427399, max_rel=156.86773681640625, norm_rel=0.0061278813518583775, ref_abs_avg=12.168198585510254, test_abs_avg=12.168095588684082
liger_forward grad[92] vs paper_forward: mean_abs=0.055747270584106445, max_abs=0.25, mean_rel=0.02152286097407341, max_rel=1.2136825323104858, norm_rel=0.005746461451053619, ref_abs_avg=10.403371810913086, test_abs_avg=10.401822090148926
liger_forward grad[93] vs paper_forward: mean_abs=0.06573845446109772, max_abs=0.75, mean_rel=0.03273392468690872, max_rel=128.6055145263672, norm_rel=0.00599389523267746, ref_abs_avg=11.935012817382812, test_abs_avg=11.934852600097656
liger_forward grad[94] vs paper_forward: mean_abs=0.06403667479753494, max_abs=0.75, mean_rel=0.034306637942790985, max_rel=144.33763122558594, norm_rel=0.005959671922028065, ref_abs_avg=11.768007278442383, test_abs_avg=11.766877174377441
liger_forward grad[95] vs paper_forward: mean_abs=0.057892054319381714, max_abs=0.25, mean_rel=0.041495054960250854, max_rel=7.665980815887451, norm_rel=0.006177048664540052, ref_abs_avg=10.195978164672852, test_abs_avg=10.193349838256836
liger_forward grad[96] vs paper_forward: mean_abs=0.06344673782587051, max_abs=0.75, mean_rel=0.03199611231684685, max_rel=92.46922302246094, norm_rel=0.005860684905201197, ref_abs_avg=11.89712142944336, test_abs_avg=11.89681625366211
liger_forward grad[97] vs paper_forward: mean_abs=0.060205403715372086, max_abs=0.625, mean_rel=0.03010730817914009, max_rel=129.15725708007812, norm_rel=0.005670970771461725, ref_abs_avg=11.719603538513184, test_abs_avg=11.717641830444336
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  48.526 ms
torch_compile_phases_forward bwd-only: 39.341 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB
liger_forward fwd+bwd:  45.238 ms
liger_forward bwd-only: 32.875 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
paper_forward fwd+bwd:  112.813 ms
paper_forward bwd-only: 89.054 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
production_forward fwd+bwd:  33.805 ms
production_forward bwd-only: 28.863 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.244 GiB, fwd+bwd=5.244 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.00166352279484272, max_abs=0.0419921875
production_forward grad[0] vs paper_forward: mean_abs=0.008379857987165451, max_abs=0.3125, mean_rel=0.07207328081130981, max_rel=90.22664642333984, norm_rel=0.019716544076800346, ref_abs_avg=0.4605426788330078, test_abs_avg=0.46055448055267334
production_forward grad[1] vs paper_forward: mean_abs=5.108944416046143, max_abs=55.0, mean_rel=0.1269531548023224, max_rel=101.69941711425781, norm_rel=0.02008632943034172, ref_abs_avg=228.8763427734375, test_abs_avg=228.8747100830078
production_forward grad[2] vs paper_forward: mean_abs=0.901914119720459, max_abs=3.75, mean_rel=0.3943414092063904, max_rel=164.18959045410156, norm_rel=0.022084234282374382, ref_abs_avg=40.4653205871582, test_abs_avg=40.443687438964844
production_forward grad[3] vs paper_forward: mean_abs=1.0784242153167725, max_abs=7.25, mean_rel=0.15726840496063232, max_rel=931.8827514648438, norm_rel=0.02300163358449936, ref_abs_avg=47.12229919433594, test_abs_avg=47.12443923950195
production_forward grad[4] vs paper_forward: mean_abs=1.055347204208374, max_abs=7.0, mean_rel=0.1609538197517395, max_rel=1029.2100830078125, norm_rel=0.02273324318230152, ref_abs_avg=46.71839141845703, test_abs_avg=46.723876953125
production_forward grad[5] vs paper_forward: mean_abs=0.7710232734680176, max_abs=3.09765625, mean_rel=0.12416896224021912, max_rel=10.490220069885254, norm_rel=0.023968899622559547, ref_abs_avg=31.441139221191406, test_abs_avg=31.428585052490234
production_forward grad[6] vs paper_forward: mean_abs=0.9284039735794067, max_abs=5.75, mean_rel=0.16345354914665222, max_rel=2569.544921875, norm_rel=0.022616617381572723, ref_abs_avg=41.23273468017578, test_abs_avg=41.2336311340332
production_forward grad[7] vs paper_forward: mean_abs=0.9055498242378235, max_abs=6.1328125, mean_rel=0.14556747674942017, max_rel=1494.5621337890625, norm_rel=0.02239932492375374, ref_abs_avg=40.63377380371094, test_abs_avg=40.63806915283203
production_forward grad[8] vs paper_forward: mean_abs=0.6886774301528931, max_abs=2.5, mean_rel=0.07483179867267609, max_rel=5.498067855834961, norm_rel=0.021618736907839775, ref_abs_avg=32.28358840942383, test_abs_avg=32.327484130859375
production_forward grad[9] vs paper_forward: mean_abs=0.8491147756576538, max_abs=5.0, mean_rel=0.16922084987163544, max_rel=1637.4722900390625, norm_rel=0.022411789745092392, ref_abs_avg=38.07654571533203, test_abs_avg=38.078399658203125
production_forward grad[10] vs paper_forward: mean_abs=0.8298985958099365, max_abs=4.75, mean_rel=0.14462795853614807, max_rel=1816.431884765625, norm_rel=0.02221347577869892, ref_abs_avg=37.54872512817383, test_abs_avg=37.55335998535156
production_forward grad[11] vs paper_forward: mean_abs=0.648655891418457, max_abs=3.0, mean_rel=0.06347321718931198, max_rel=1.9649488925933838, norm_rel=0.022325143218040466, ref_abs_avg=29.19913101196289, test_abs_avg=29.201534271240234
production_forward grad[12] vs paper_forward: mean_abs=0.7839428782463074, max_abs=5.0, mean_rel=0.16431860625743866, max_rel=1986.3594970703125, norm_rel=0.02228265441954136, ref_abs_avg=35.33815002441406, test_abs_avg=35.34318542480469
production_forward grad[13] vs paper_forward: mean_abs=0.7629570364952087, max_abs=4.75, mean_rel=0.14996452629566193, max_rel=1408.24951171875, norm_rel=0.02200310491025448, ref_abs_avg=34.88893508911133, test_abs_avg=34.892486572265625
production_forward grad[14] vs paper_forward: mean_abs=0.5923069715499878, max_abs=2.125, mean_rel=0.11151120066642761, max_rel=11.110078811645508, norm_rel=0.021398790180683136, ref_abs_avg=27.74383544921875, test_abs_avg=27.746105194091797
production_forward grad[15] vs paper_forward: mean_abs=0.7335167527198792, max_abs=5.0, mean_rel=0.15257635712623596, max_rel=1883.9970703125, norm_rel=0.022096365690231323, ref_abs_avg=33.36769104003906, test_abs_avg=33.369110107421875
production_forward grad[16] vs paper_forward: mean_abs=0.7175731658935547, max_abs=4.25, mean_rel=0.15455923974514008, max_rel=1290.0289306640625, norm_rel=0.02184726484119892, ref_abs_avg=33.01008224487305, test_abs_avg=33.01062774658203
production_forward grad[17] vs paper_forward: mean_abs=0.576641321182251, max_abs=2.0, mean_rel=0.10215912759304047, max_rel=7.008848667144775, norm_rel=0.022951601073145866, ref_abs_avg=24.566938400268555, test_abs_avg=24.57032585144043
production_forward grad[18] vs paper_forward: mean_abs=0.6965099573135376, max_abs=4.5, mean_rel=0.15127238631248474, max_rel=1504.5616455078125, norm_rel=0.022024376317858696, ref_abs_avg=31.79619598388672, test_abs_avg=31.798063278198242
production_forward grad[19] vs paper_forward: mean_abs=0.6784148216247559, max_abs=4.0, mean_rel=0.1519683301448822, max_rel=958.98828125, norm_rel=0.021642349660396576, ref_abs_avg=31.49234390258789, test_abs_avg=31.492631912231445
production_forward grad[20] vs paper_forward: mean_abs=0.5649511814117432, max_abs=2.125, mean_rel=0.08596444129943848, max_rel=6.036452770233154, norm_rel=0.02183712273836136, ref_abs_avg=25.639265060424805, test_abs_avg=25.654842376708984
production_forward grad[21] vs paper_forward: mean_abs=0.6521984338760376, max_abs=4.0, mean_rel=0.14642149209976196, max_rel=1364.4915771484375, norm_rel=0.021801568567752838, ref_abs_avg=30.079837799072266, test_abs_avg=30.079498291015625
production_forward grad[22] vs paper_forward: mean_abs=0.6393349170684814, max_abs=4.125, mean_rel=0.15044352412223816, max_rel=2276.68603515625, norm_rel=0.021461820229887962, ref_abs_avg=29.975627899169922, test_abs_avg=29.97738265991211
production_forward grad[23] vs paper_forward: mean_abs=0.47813987731933594, max_abs=2.1875, mean_rel=0.11054690182209015, max_rel=17.76256561279297, norm_rel=0.020831888541579247, ref_abs_avg=23.20960807800293, test_abs_avg=23.199569702148438
production_forward grad[24] vs paper_forward: mean_abs=0.6201862692832947, max_abs=4.0, mean_rel=0.14269647002220154, max_rel=897.6026000976562, norm_rel=0.02172458916902542, ref_abs_avg=28.689579010009766, test_abs_avg=28.691205978393555
production_forward grad[25] vs paper_forward: mean_abs=0.604860782623291, max_abs=3.75, mean_rel=0.16062583029270172, max_rel=1022.4013061523438, norm_rel=0.02117973007261753, ref_abs_avg=28.690685272216797, test_abs_avg=28.686798095703125
production_forward grad[26] vs paper_forward: mean_abs=0.5808590054512024, max_abs=3.0, mean_rel=0.4518956243991852, max_rel=175.6417694091797, norm_rel=0.02328767068684101, ref_abs_avg=25.088687896728516, test_abs_avg=25.18903350830078
production_forward grad[27] vs paper_forward: mean_abs=0.7102311849594116, max_abs=4.25, mean_rel=0.1724207103252411, max_rel=1381.8665771484375, norm_rel=0.023271817713975906, ref_abs_avg=30.647388458251953, test_abs_avg=30.647621154785156
production_forward grad[28] vs paper_forward: mean_abs=0.691656768321991, max_abs=4.25, mean_rel=0.14715313911437988, max_rel=573.8308715820312, norm_rel=0.0227938462048769, ref_abs_avg=30.443103790283203, test_abs_avg=30.44571304321289
production_forward grad[29] vs paper_forward: mean_abs=0.5371564626693726, max_abs=2.5, mean_rel=0.17926953732967377, max_rel=55.699371337890625, norm_rel=0.023308949545025826, ref_abs_avg=22.716825485229492, test_abs_avg=22.69635772705078
production_forward grad[30] vs paper_forward: mean_abs=0.6637523174285889, max_abs=4.5, mean_rel=0.16197773814201355, max_rel=1384.7427978515625, norm_rel=0.023646462708711624, ref_abs_avg=28.162933349609375, test_abs_avg=28.164962768554688
production_forward grad[31] vs paper_forward: mean_abs=0.6503852009773254, max_abs=4.25, mean_rel=0.15870682895183563, max_rel=923.546875, norm_rel=0.02345549501478672, ref_abs_avg=27.818199157714844, test_abs_avg=27.824264526367188
production_forward grad[32] vs paper_forward: mean_abs=0.5150351524353027, max_abs=2.25, mean_rel=0.11116951704025269, max_rel=18.83376121520996, norm_rel=0.023401958867907524, ref_abs_avg=22.65468978881836, test_abs_avg=22.655223846435547
production_forward grad[33] vs paper_forward: mean_abs=0.6176050901412964, max_abs=4.0, mean_rel=0.14974841475486755, max_rel=857.8855590820312, norm_rel=0.023564664646983147, ref_abs_avg=26.30065155029297, test_abs_avg=26.303504943847656
production_forward grad[34] vs paper_forward: mean_abs=0.6051341891288757, max_abs=4.0, mean_rel=0.14808207750320435, max_rel=874.1725463867188, norm_rel=0.0235625971108675, ref_abs_avg=25.792686462402344, test_abs_avg=25.795825958251953
production_forward grad[35] vs paper_forward: mean_abs=0.46848297119140625, max_abs=1.78125, mean_rel=0.11048462986946106, max_rel=5.890194892883301, norm_rel=0.02390887774527073, ref_abs_avg=19.876972198486328, test_abs_avg=19.90926742553711
production_forward grad[36] vs paper_forward: mean_abs=0.5789732933044434, max_abs=4.0078125, mean_rel=0.15995901823043823, max_rel=1162.8511962890625, norm_rel=0.023400278761982918, ref_abs_avg=24.794673919677734, test_abs_avg=24.795948028564453
production_forward grad[37] vs paper_forward: mean_abs=0.5705393552780151, max_abs=3.5, mean_rel=0.15888094902038574, max_rel=726.6344604492188, norm_rel=0.023182589560747147, ref_abs_avg=24.675701141357422, test_abs_avg=24.675968170166016
production_forward grad[38] vs paper_forward: mean_abs=0.4365386962890625, max_abs=1.875, mean_rel=0.08251935988664627, max_rel=6.026438236236572, norm_rel=0.022393343970179558, ref_abs_avg=19.830127716064453, test_abs_avg=19.81458282470703
production_forward grad[39] vs paper_forward: mean_abs=0.5452750325202942, max_abs=3.2734375, mean_rel=0.16308841109275818, max_rel=883.8492431640625, norm_rel=0.0230725035071373, ref_abs_avg=23.669170379638672, test_abs_avg=23.67144775390625
production_forward grad[40] vs paper_forward: mean_abs=0.5376452207565308, max_abs=3.5, mean_rel=0.15192720293998718, max_rel=697.4414672851562, norm_rel=0.02299194596707821, ref_abs_avg=23.465896606445312, test_abs_avg=23.46860122680664
production_forward grad[41] vs paper_forward: mean_abs=0.4280214309692383, max_abs=1.71875, mean_rel=0.09767626225948334, max_rel=9.245465278625488, norm_rel=0.022553246468305588, ref_abs_avg=19.259944915771484, test_abs_avg=19.285146713256836
production_forward grad[42] vs paper_forward: mean_abs=0.5197265148162842, max_abs=3.5, mean_rel=0.15806077420711517, max_rel=852.6958618164062, norm_rel=0.02284112758934498, ref_abs_avg=22.7852783203125, test_abs_avg=22.786779403686523
production_forward grad[43] vs paper_forward: mean_abs=0.51041579246521, max_abs=3.125, mean_rel=0.15237554907798767, max_rel=1464.5391845703125, norm_rel=0.02257116697728634, ref_abs_avg=22.652517318725586, test_abs_avg=22.656234741210938
production_forward grad[44] vs paper_forward: mean_abs=0.42289066314697266, max_abs=1.875, mean_rel=0.07719387114048004, max_rel=4.899806499481201, norm_rel=0.023235486820340157, ref_abs_avg=17.874767303466797, test_abs_avg=17.88501739501953
production_forward grad[45] vs paper_forward: mean_abs=0.4940120577812195, max_abs=3.1875, mean_rel=0.1455407440662384, max_rel=1021.9345092773438, norm_rel=0.02260103076696396, ref_abs_avg=21.885013580322266, test_abs_avg=21.88686752319336
production_forward grad[46] vs paper_forward: mean_abs=0.48798102140426636, max_abs=3.5, mean_rel=0.14191868901252747, max_rel=592.2760009765625, norm_rel=0.022315850481390953, ref_abs_avg=21.906753540039062, test_abs_avg=21.909473419189453
production_forward grad[47] vs paper_forward: mean_abs=0.3687615394592285, max_abs=1.375, mean_rel=0.07295652478933334, max_rel=5.1827569007873535, norm_rel=0.02104254625737667, ref_abs_avg=17.578731536865234, test_abs_avg=17.603517532348633
production_forward grad[48] vs paper_forward: mean_abs=0.47430384159088135, max_abs=3.125, mean_rel=0.14358678460121155, max_rel=773.2720336914062, norm_rel=0.022328298538923264, ref_abs_avg=21.270282745361328, test_abs_avg=21.272781372070312
production_forward grad[49] vs paper_forward: mean_abs=0.4665071964263916, max_abs=3.375, mean_rel=0.13006740808486938, max_rel=438.7271423339844, norm_rel=0.021826205775141716, ref_abs_avg=21.40924835205078, test_abs_avg=21.40826988220215
production_forward grad[50] vs paper_forward: mean_abs=0.465366005897522, max_abs=2.0, mean_rel=0.2772613763809204, max_rel=80.20748138427734, norm_rel=0.025385061278939247, ref_abs_avg=18.148914337158203, test_abs_avg=18.170122146606445
production_forward grad[51] vs paper_forward: mean_abs=0.5334144234657288, max_abs=3.625, mean_rel=0.15816181898117065, max_rel=991.05517578125, norm_rel=0.024496566504240036, ref_abs_avg=21.814889907836914, test_abs_avg=21.817485809326172
production_forward grad[52] vs paper_forward: mean_abs=0.5223775506019592, max_abs=3.171875, mean_rel=0.1620214581489563, max_rel=538.3858032226562, norm_rel=0.024108635261654854, ref_abs_avg=21.705650329589844, test_abs_avg=21.70840072631836
production_forward grad[53] vs paper_forward: mean_abs=0.4000258445739746, max_abs=1.9375, mean_rel=0.08763544261455536, max_rel=4.389936923980713, norm_rel=0.023415129631757736, ref_abs_avg=16.960615158081055, test_abs_avg=16.949058532714844
production_forward grad[54] vs paper_forward: mean_abs=0.4938875138759613, max_abs=3.25, mean_rel=0.15751764178276062, max_rel=741.7076416015625, norm_rel=0.024251814931631088, ref_abs_avg=20.405319213867188, test_abs_avg=20.404815673828125
production_forward grad[55] vs paper_forward: mean_abs=0.48176810145378113, max_abs=4.0, mean_rel=0.17315182089805603, max_rel=1005.74609375, norm_rel=0.024057215079665184, ref_abs_avg=20.11153221130371, test_abs_avg=20.112510681152344
production_forward grad[56] vs paper_forward: mean_abs=0.40767061710357666, max_abs=1.625, mean_rel=0.1191960871219635, max_rel=11.72509765625, norm_rel=0.02491164207458496, ref_abs_avg=16.103778839111328, test_abs_avg=16.08420753479004
production_forward grad[57] vs paper_forward: mean_abs=0.4551675319671631, max_abs=3.0, mean_rel=0.14717787504196167, max_rel=837.4179077148438, norm_rel=0.023678570985794067, ref_abs_avg=19.248184204101562, test_abs_avg=19.24774932861328
production_forward grad[58] vs paper_forward: mean_abs=0.44666773080825806, max_abs=3.0, mean_rel=0.15241701900959015, max_rel=930.439697265625, norm_rel=0.023549379780888557, ref_abs_avg=18.991424560546875, test_abs_avg=18.99311065673828
production_forward grad[59] vs paper_forward: mean_abs=0.35525286197662354, max_abs=1.5, mean_rel=0.23637551069259644, max_rel=89.1234359741211, norm_rel=0.02436310052871704, ref_abs_avg=15.236005783081055, test_abs_avg=15.229959487915039
production_forward grad[60] vs paper_forward: mean_abs=0.42510926723480225, max_abs=3.0625, mean_rel=0.14324374496936798, max_rel=612.793701171875, norm_rel=0.02313048578798771, ref_abs_avg=18.378768920898438, test_abs_avg=18.379934310913086
production_forward grad[61] vs paper_forward: mean_abs=0.42056334018707275, max_abs=2.9375, mean_rel=0.15031586587429047, max_rel=902.3006591796875, norm_rel=0.023015007376670837, ref_abs_avg=18.29613494873047, test_abs_avg=18.299537658691406
production_forward grad[62] vs paper_forward: mean_abs=0.3393624424934387, max_abs=1.5, mean_rel=0.09908528625965118, max_rel=4.315269470214844, norm_rel=0.023788269609212875, ref_abs_avg=14.333507537841797, test_abs_avg=14.307815551757812
production_forward grad[63] vs paper_forward: mean_abs=0.4049268960952759, max_abs=3.25, mean_rel=0.15278439223766327, max_rel=1073.201171875, norm_rel=0.022628115490078926, ref_abs_avg=17.86745834350586, test_abs_avg=17.868228912353516
production_forward grad[64] vs paper_forward: mean_abs=0.39221975207328796, max_abs=2.75, mean_rel=0.14555178582668304, max_rel=516.2390747070312, norm_rel=0.022447818890213966, ref_abs_avg=17.48320198059082, test_abs_avg=17.483966827392578
production_forward grad[65] vs paper_forward: mean_abs=0.30470672249794006, max_abs=1.25, mean_rel=0.06316733360290527, max_rel=1.947658658027649, norm_rel=0.02139993943274021, ref_abs_avg=14.632768630981445, test_abs_avg=14.631477355957031
production_forward grad[66] vs paper_forward: mean_abs=0.3835265338420868, max_abs=3.0, mean_rel=0.13769353926181793, max_rel=641.776123046875, norm_rel=0.022472839802503586, ref_abs_avg=17.062786102294922, test_abs_avg=17.06417465209961
production_forward grad[67] vs paper_forward: mean_abs=0.37443721294403076, max_abs=2.5625, mean_rel=0.14995622634887695, max_rel=742.7169799804688, norm_rel=0.02192896418273449, ref_abs_avg=17.040346145629883, test_abs_avg=17.039199829101562
production_forward grad[68] vs paper_forward: mean_abs=0.303094744682312, max_abs=1.25, mean_rel=0.09858496487140656, max_rel=11.911457061767578, norm_rel=0.02179686911404133, ref_abs_avg=13.867818832397461, test_abs_avg=13.862428665161133
production_forward grad[69] vs paper_forward: mean_abs=0.3591938018798828, max_abs=2.388671875, mean_rel=0.14544980227947235, max_rel=596.1968383789062, norm_rel=0.02213379740715027, ref_abs_avg=16.23541831970215, test_abs_avg=16.23505973815918
production_forward grad[70] vs paper_forward: mean_abs=0.3562644124031067, max_abs=2.625, mean_rel=0.13003912568092346, max_rel=487.9097595214844, norm_rel=0.021820316091179848, ref_abs_avg=16.338821411132812, test_abs_avg=16.345165252685547
production_forward grad[71] vs paper_forward: mean_abs=0.2754805088043213, max_abs=1.125, mean_rel=0.17209556698799133, max_rel=30.415935516357422, norm_rel=0.021023282781243324, ref_abs_avg=13.103767395019531, test_abs_avg=13.111617088317871
production_forward grad[72] vs paper_forward: mean_abs=0.3434522747993469, max_abs=3.0, mean_rel=0.14524003863334656, max_rel=896.4239501953125, norm_rel=0.02178734540939331, ref_abs_avg=15.759526252746582, test_abs_avg=15.760442733764648
production_forward grad[73] vs paper_forward: mean_abs=0.3395606577396393, max_abs=2.75, mean_rel=0.13756698369979858, max_rel=491.9680480957031, norm_rel=0.021344656124711037, ref_abs_avg=15.916206359863281, test_abs_avg=15.918434143066406
production_forward grad[74] vs paper_forward: mean_abs=0.32492899894714355, max_abs=1.125, mean_rel=0.139021635055542, max_rel=11.611391067504883, norm_rel=0.022560136392712593, ref_abs_avg=14.553180694580078, test_abs_avg=14.557857513427734
production_forward grad[75] vs paper_forward: mean_abs=0.39946821331977844, max_abs=2.75, mean_rel=0.15557631850242615, max_rel=972.6058959960938, norm_rel=0.023115847259759903, ref_abs_avg=17.30860710144043, test_abs_avg=17.310218811035156
production_forward grad[76] vs paper_forward: mean_abs=0.38630521297454834, max_abs=3.21875, mean_rel=0.1387193650007248, max_rel=473.9388732910156, norm_rel=0.022595573216676712, ref_abs_avg=17.09038543701172, test_abs_avg=17.088520050048828
production_forward grad[77] vs paper_forward: mean_abs=0.2881655693054199, max_abs=1.09375, mean_rel=0.15012821555137634, max_rel=32.162620544433594, norm_rel=0.020152496173977852, ref_abs_avg=14.19080638885498, test_abs_avg=14.213544845581055
production_forward grad[78] vs paper_forward: mean_abs=0.35967928171157837, max_abs=3.03125, mean_rel=0.14632856845855713, max_rel=953.33740234375, norm_rel=0.022429022938013077, ref_abs_avg=16.039751052856445, test_abs_avg=16.041423797607422
production_forward grad[79] vs paper_forward: mean_abs=0.35163772106170654, max_abs=2.875, mean_rel=0.14544248580932617, max_rel=742.4296264648438, norm_rel=0.02234557271003723, ref_abs_avg=15.769672393798828, test_abs_avg=15.768888473510742
production_forward grad[80] vs paper_forward: mean_abs=0.2821515202522278, max_abs=1.125, mean_rel=0.22475548088550568, max_rel=80.69062042236328, norm_rel=0.021105438470840454, ref_abs_avg=13.444381713867188, test_abs_avg=13.444494247436523
production_forward grad[81] vs paper_forward: mean_abs=0.3380078077316284, max_abs=3.25, mean_rel=0.13755320012569427, max_rel=703.9185180664062, norm_rel=0.021613745018839836, ref_abs_avg=15.631427764892578, test_abs_avg=15.631452560424805
production_forward grad[82] vs paper_forward: mean_abs=0.3276371955871582, max_abs=3.4375, mean_rel=0.13215766847133636, max_rel=563.4662475585938, norm_rel=0.021246617659926414, ref_abs_avg=15.410087585449219, test_abs_avg=15.406822204589844
production_forward grad[83] vs paper_forward: mean_abs=0.247541606426239, max_abs=1.125, mean_rel=0.13564825057983398, max_rel=44.891414642333984, norm_rel=0.020147962495684624, ref_abs_avg=12.876523971557617, test_abs_avg=12.870849609375
production_forward grad[84] vs paper_forward: mean_abs=0.30861014127731323, max_abs=3.0, mean_rel=0.1343013346195221, max_rel=525.8814086914062, norm_rel=0.021081821992993355, ref_abs_avg=14.673133850097656, test_abs_avg=14.673404693603516
production_forward grad[85] vs paper_forward: mean_abs=0.3038496971130371, max_abs=2.5, mean_rel=0.12935583293437958, max_rel=535.7210083007812, norm_rel=0.020711440593004227, ref_abs_avg=14.724996566772461, test_abs_avg=14.735381126403809
production_forward grad[86] vs paper_forward: mean_abs=0.23787522315979004, max_abs=0.9375, mean_rel=0.15522125363349915, max_rel=39.106937408447266, norm_rel=0.020005522295832634, ref_abs_avg=12.020120620727539, test_abs_avg=12.00943374633789
production_forward grad[87] vs paper_forward: mean_abs=0.29708677530288696, max_abs=2.625, mean_rel=0.1257345974445343, max_rel=381.80841064453125, norm_rel=0.020487604662775993, ref_abs_avg=14.542656898498535, test_abs_avg=14.543041229248047
production_forward grad[88] vs paper_forward: mean_abs=0.2909008860588074, max_abs=2.75, mean_rel=0.1356918066740036, max_rel=418.7450256347656, norm_rel=0.021102607250213623, ref_abs_avg=14.030677795410156, test_abs_avg=14.033698081970215
production_forward grad[89] vs paper_forward: mean_abs=0.2287306785583496, max_abs=0.9375, mean_rel=0.072777658700943, max_rel=8.48630428314209, norm_rel=0.01954670622944832, ref_abs_avg=11.74140739440918, test_abs_avg=11.748523712158203
production_forward grad[90] vs paper_forward: mean_abs=0.273947536945343, max_abs=2.5, mean_rel=0.12377513945102692, max_rel=630.5922241210938, norm_rel=0.020113738253712654, ref_abs_avg=13.711470603942871, test_abs_avg=13.71070384979248
production_forward grad[91] vs paper_forward: mean_abs=0.2687571346759796, max_abs=2.375, mean_rel=0.11689523607492447, max_rel=223.79844665527344, norm_rel=0.019674129784107208, ref_abs_avg=13.769708633422852, test_abs_avg=13.768373489379883
production_forward grad[92] vs paper_forward: mean_abs=0.2267439365386963, max_abs=0.875, mean_rel=0.1332603096961975, max_rel=20.60555076599121, norm_rel=0.01963610015809536, ref_abs_avg=11.4403076171875, test_abs_avg=11.458597183227539
production_forward grad[93] vs paper_forward: mean_abs=0.2660714387893677, max_abs=2.75, mean_rel=0.122727170586586, max_rel=958.6727905273438, norm_rel=0.01961761713027954, ref_abs_avg=13.735881805419922, test_abs_avg=13.73616886138916
production_forward grad[94] vs paper_forward: mean_abs=0.25378960371017456, max_abs=2.53125, mean_rel=0.12567433714866638, max_rel=822.7374267578125, norm_rel=0.018765976652503014, ref_abs_avg=13.606517791748047, test_abs_avg=13.602265357971191
production_forward grad[95] vs paper_forward: mean_abs=0.21861600875854492, max_abs=1.0, mean_rel=0.05249391868710518, max_rel=1.64417564868927, norm_rel=0.019794845953583717, ref_abs_avg=11.200263023376465, test_abs_avg=11.200435638427734
production_forward grad[96] vs paper_forward: mean_abs=0.24436655640602112, max_abs=2.75, mean_rel=0.12025894224643707, max_rel=554.015869140625, norm_rel=0.019115615636110306, ref_abs_avg=12.974870681762695, test_abs_avg=12.974071502685547
production_forward grad[97] vs paper_forward: mean_abs=0.23818360269069672, max_abs=2.5, mean_rel=0.11013717204332352, max_rel=278.0838623046875, norm_rel=0.019214855507016182, ref_abs_avg=12.595630645751953, test_abs_avg=12.600157737731934
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016661765985190868, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008727341890335083, max_abs=0.3125, mean_rel=0.07470688223838806, max_rel=90.42068481445312, norm_rel=0.020422201603651047, ref_abs_avg=0.4605426788330078, test_abs_avg=0.4605451822280884
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.166532039642334, max_abs=57.0, mean_rel=0.13103598356246948, max_rel=203.7401580810547, norm_rel=0.020363101735711098, ref_abs_avg=228.8763427734375, test_abs_avg=228.91043090820312
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.8761875629425049, max_abs=3.5, mean_rel=0.1959933340549469, max_rel=63.62762451171875, norm_rel=0.02179623581469059, ref_abs_avg=40.4653205871582, test_abs_avg=40.42311096191406
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.1170127391815186, max_abs=7.75, mean_rel=0.16580267250537872, max_rel=1219.81591796875, norm_rel=0.023803556337952614, ref_abs_avg=47.12229919433594, test_abs_avg=47.12348175048828
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.0871024131774902, max_abs=7.25, mean_rel=0.16512176394462585, max_rel=1109.55517578125, norm_rel=0.023427927866578102, ref_abs_avg=46.71839141845703, test_abs_avg=46.72471618652344
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.797184944152832, max_abs=3.0, mean_rel=0.1275358349084854, max_rel=7.61667013168335, norm_rel=0.024737631902098656, ref_abs_avg=31.441139221191406, test_abs_avg=31.41402816772461
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9592121839523315, max_abs=6.0, mean_rel=0.1664399951696396, max_rel=1855.82958984375, norm_rel=0.02336767688393593, ref_abs_avg=41.23273468017578, test_abs_avg=41.23399353027344
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9364880323410034, max_abs=6.041015625, mean_rel=0.15391016006469727, max_rel=1052.5386962890625, norm_rel=0.023150810971856117, ref_abs_avg=40.63377380371094, test_abs_avg=40.63983917236328
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7289378643035889, max_abs=2.5, mean_rel=0.0665058046579361, max_rel=2.2750625610351562, norm_rel=0.022513944655656815, ref_abs_avg=32.28358840942383, test_abs_avg=32.32212829589844
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.8772688508033752, max_abs=5.0, mean_rel=0.16946810483932495, max_rel=1626.9989013671875, norm_rel=0.02314573898911476, ref_abs_avg=38.07654571533203, test_abs_avg=38.07749938964844
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8581949472427368, max_abs=5.5, mean_rel=0.1526200920343399, max_rel=1341.9560546875, norm_rel=0.0229698084294796, ref_abs_avg=37.54872512817383, test_abs_avg=37.55268096923828
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6799783706665039, max_abs=2.75, mean_rel=0.07209563255310059, max_rel=3.8859758377075195, norm_rel=0.023041240870952606, ref_abs_avg=29.19913101196289, test_abs_avg=29.213069915771484
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8067256212234497, max_abs=5.25, mean_rel=0.1644323170185089, max_rel=1453.7725830078125, norm_rel=0.022939447313547134, ref_abs_avg=35.33815002441406, test_abs_avg=35.34178161621094
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.7873157858848572, max_abs=4.75, mean_rel=0.1558656096458435, max_rel=856.3251342773438, norm_rel=0.02268066816031933, ref_abs_avg=34.88893508911133, test_abs_avg=34.8925895690918
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6270217895507812, max_abs=2.4375, mean_rel=0.10677826404571533, max_rel=10.9241361618042, norm_rel=0.02278711088001728, ref_abs_avg=27.74383544921875, test_abs_avg=27.728675842285156
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7549415230751038, max_abs=5.0, mean_rel=0.15304192900657654, max_rel=1094.197998046875, norm_rel=0.022729558870196342, ref_abs_avg=33.36769104003906, test_abs_avg=33.36761474609375
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.739398717880249, max_abs=4.375, mean_rel=0.15846270322799683, max_rel=956.0194702148438, norm_rel=0.022512860596179962, ref_abs_avg=33.01008224487305, test_abs_avg=33.01031494140625
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.6154403686523438, max_abs=2.25, mean_rel=0.09099327027797699, max_rel=4.8391900062561035, norm_rel=0.024401908740401268, ref_abs_avg=24.566938400268555, test_abs_avg=24.541549682617188
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.715722918510437, max_abs=4.28125, mean_rel=0.15179452300071716, max_rel=1340.5606689453125, norm_rel=0.022623861208558083, ref_abs_avg=31.79619598388672, test_abs_avg=31.798254013061523
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.6967850923538208, max_abs=4.25, mean_rel=0.15193480253219604, max_rel=777.8707885742188, norm_rel=0.0222164373844862, ref_abs_avg=31.49234390258789, test_abs_avg=31.49099349975586
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5642762184143066, max_abs=2.25, mean_rel=0.08003848791122437, max_rel=3.6141417026519775, norm_rel=0.02241457812488079, ref_abs_avg=25.639265060424805, test_abs_avg=25.65396499633789
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6692819595336914, max_abs=4.25, mean_rel=0.1493687629699707, max_rel=1750.21240234375, norm_rel=0.02235870622098446, ref_abs_avg=30.079837799072266, test_abs_avg=30.078018188476562
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6565077304840088, max_abs=4.375, mean_rel=0.15425172448158264, max_rel=1725.167724609375, norm_rel=0.02201497182250023, ref_abs_avg=29.975627899169922, test_abs_avg=29.97651481628418
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.488344669342041, max_abs=2.0, mean_rel=0.1011798232793808, max_rel=19.563316345214844, norm_rel=0.02130713313817978, ref_abs_avg=23.20960807800293, test_abs_avg=23.238056182861328
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6361637115478516, max_abs=4.125, mean_rel=0.14546774327754974, max_rel=926.8021240234375, norm_rel=0.022274291142821312, ref_abs_avg=28.689579010009766, test_abs_avg=28.69131088256836
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6212224960327148, max_abs=3.75, mean_rel=0.1748647540807724, max_rel=1142.624267578125, norm_rel=0.021749835461378098, ref_abs_avg=28.690685272216797, test_abs_avg=28.689136505126953
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6075202822685242, max_abs=2.5, mean_rel=0.14158213138580322, max_rel=20.72307586669922, norm_rel=0.024346938356757164, ref_abs_avg=25.088687896728516, test_abs_avg=25.155872344970703
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7284088730812073, max_abs=4.625, mean_rel=0.17673736810684204, max_rel=1147.731201171875, norm_rel=0.023880405351519585, ref_abs_avg=30.647388458251953, test_abs_avg=30.64760971069336
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7135846614837646, max_abs=4.75, mean_rel=0.15427188575267792, max_rel=686.140869140625, norm_rel=0.023519089445471764, ref_abs_avg=30.443103790283203, test_abs_avg=30.44109344482422
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5435718297958374, max_abs=2.375, mean_rel=0.07159053534269333, max_rel=3.34549880027771, norm_rel=0.023402627557516098, ref_abs_avg=22.716825485229492, test_abs_avg=22.696250915527344
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6793652772903442, max_abs=4.65625, mean_rel=0.16370731592178345, max_rel=1540.81591796875, norm_rel=0.024187205359339714, ref_abs_avg=28.162933349609375, test_abs_avg=28.163965225219727
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6625162959098816, max_abs=4.25, mean_rel=0.1608656495809555, max_rel=817.7774047851562, norm_rel=0.02390512265264988, ref_abs_avg=27.818199157714844, test_abs_avg=27.8211727142334
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5364570617675781, max_abs=2.0, mean_rel=0.15387454628944397, max_rel=37.244483947753906, norm_rel=0.024283304810523987, ref_abs_avg=22.65468978881836, test_abs_avg=22.65845489501953
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.6298893690109253, max_abs=4.5, mean_rel=0.1504717469215393, max_rel=823.4329223632812, norm_rel=0.024036748334765434, ref_abs_avg=26.30065155029297, test_abs_avg=26.302780151367188
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6206644177436829, max_abs=3.96484375, mean_rel=0.15375052392482758, max_rel=815.3382568359375, norm_rel=0.02415655180811882, ref_abs_avg=25.792686462402344, test_abs_avg=25.79625129699707
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.46186161041259766, max_abs=2.01953125, mean_rel=0.12042464315891266, max_rel=9.574637413024902, norm_rel=0.023982051759958267, ref_abs_avg=19.876972198486328, test_abs_avg=19.89316177368164
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.5904750823974609, max_abs=3.6875, mean_rel=0.16263331472873688, max_rel=934.8530883789062, norm_rel=0.023869482800364494, ref_abs_avg=24.794673919677734, test_abs_avg=24.79558753967285
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5821914672851562, max_abs=4.375, mean_rel=0.1541590690612793, max_rel=952.66162109375, norm_rel=0.023644588887691498, ref_abs_avg=24.675701141357422, test_abs_avg=24.674781799316406
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.44835948944091797, max_abs=1.75, mean_rel=0.06838995218276978, max_rel=2.027327299118042, norm_rel=0.023214904591441154, ref_abs_avg=19.830127716064453, test_abs_avg=19.824230194091797
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5560375452041626, max_abs=3.25, mean_rel=0.16713392734527588, max_rel=1271.744384765625, norm_rel=0.02351047657430172, ref_abs_avg=23.669170379638672, test_abs_avg=23.670019149780273
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5490085482597351, max_abs=3.5, mean_rel=0.15620961785316467, max_rel=949.6234130859375, norm_rel=0.023471303284168243, ref_abs_avg=23.465896606445312, test_abs_avg=23.46621322631836
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.44490158557891846, max_abs=1.625, mean_rel=0.1057671383023262, max_rel=11.493125915527344, norm_rel=0.02306286245584488, ref_abs_avg=19.259944915771484, test_abs_avg=19.305391311645508
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5286404490470886, max_abs=3.5, mean_rel=0.157261461019516, max_rel=825.3296508789062, norm_rel=0.023241184651851654, ref_abs_avg=22.7852783203125, test_abs_avg=22.785974502563477
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5195778608322144, max_abs=3.375, mean_rel=0.15271316468715668, max_rel=935.7128295898438, norm_rel=0.022976309061050415, ref_abs_avg=22.652517318725586, test_abs_avg=22.65264129638672
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.4209709167480469, max_abs=1.75, mean_rel=0.07920828461647034, max_rel=2.376999616622925, norm_rel=0.023396499454975128, ref_abs_avg=17.874767303466797, test_abs_avg=17.88567352294922
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5018361210823059, max_abs=3.0, mean_rel=0.15070050954818726, max_rel=1096.8670654296875, norm_rel=0.022962167859077454, ref_abs_avg=21.885013580322266, test_abs_avg=21.88648223876953
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.49660563468933105, max_abs=3.5, mean_rel=0.14621210098266602, max_rel=806.4334106445312, norm_rel=0.02268856205046177, ref_abs_avg=21.906753540039062, test_abs_avg=21.908475875854492
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.3791837692260742, max_abs=1.625, mean_rel=0.07881095260381699, max_rel=5.208069801330566, norm_rel=0.02145971544086933, ref_abs_avg=17.578731536865234, test_abs_avg=17.60824203491211
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.48109620809555054, max_abs=3.15625, mean_rel=0.1496431827545166, max_rel=654.2178344726562, norm_rel=0.022634224966168404, ref_abs_avg=21.270282745361328, test_abs_avg=21.272293090820312
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.4731657803058624, max_abs=3.21875, mean_rel=0.1313929259777069, max_rel=362.53955078125, norm_rel=0.022132664918899536, ref_abs_avg=21.40924835205078, test_abs_avg=21.406089782714844
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.4675161838531494, max_abs=1.75, mean_rel=0.4280391037464142, max_rel=149.93853759765625, norm_rel=0.02532246895134449, ref_abs_avg=18.148914337158203, test_abs_avg=18.180015563964844
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5433958172798157, max_abs=3.625, mean_rel=0.16049133241176605, max_rel=1212.9451904296875, norm_rel=0.02493930421769619, ref_abs_avg=21.814889907836914, test_abs_avg=21.81566047668457
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5309677124023438, max_abs=3.375, mean_rel=0.16816993057727814, max_rel=790.30419921875, norm_rel=0.02447616308927536, ref_abs_avg=21.705650329589844, test_abs_avg=21.706634521484375
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.3998994827270508, max_abs=1.9375, mean_rel=0.09809868037700653, max_rel=6.615750789642334, norm_rel=0.023584336042404175, ref_abs_avg=16.960615158081055, test_abs_avg=16.941564559936523
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.5021134614944458, max_abs=3.375, mean_rel=0.1596481204032898, max_rel=906.8492431640625, norm_rel=0.024647783488035202, ref_abs_avg=20.405319213867188, test_abs_avg=20.404306411743164
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.4896774888038635, max_abs=4.5, mean_rel=0.17855793237686157, max_rel=1106.3646240234375, norm_rel=0.02445124462246895, ref_abs_avg=20.11153221130371, test_abs_avg=20.111774444580078
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.39218056201934814, max_abs=1.625, mean_rel=0.11231054365634918, max_rel=17.173364639282227, norm_rel=0.024764787405729294, ref_abs_avg=16.103778839111328, test_abs_avg=16.075790405273438
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.46252796053886414, max_abs=3.25, mean_rel=0.1481429636478424, max_rel=748.4072265625, norm_rel=0.024046670645475388, ref_abs_avg=19.248184204101562, test_abs_avg=19.248332977294922
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4550461173057556, max_abs=3.125, mean_rel=0.15465618669986725, max_rel=835.499755859375, norm_rel=0.02397586777806282, ref_abs_avg=18.991424560546875, test_abs_avg=18.992944717407227
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.354073166847229, max_abs=1.3125, mean_rel=0.18820321559906006, max_rel=63.12318420410156, norm_rel=0.024457627907395363, ref_abs_avg=15.236005783081055, test_abs_avg=15.228570938110352
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.4314868748188019, max_abs=3.125, mean_rel=0.1450444757938385, max_rel=542.4317016601562, norm_rel=0.023464566096663475, ref_abs_avg=18.378768920898438, test_abs_avg=18.37999725341797
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4256172776222229, max_abs=2.875, mean_rel=0.15363037586212158, max_rel=832.90771484375, norm_rel=0.023307982832193375, ref_abs_avg=18.29613494873047, test_abs_avg=18.30070686340332
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3558003902435303, max_abs=1.6875, mean_rel=0.10311728715896606, max_rel=5.550490856170654, norm_rel=0.02505904994904995, ref_abs_avg=14.333507537841797, test_abs_avg=14.309623718261719
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.41065913438796997, max_abs=3.0, mean_rel=0.15357628464698792, max_rel=777.774658203125, norm_rel=0.02293584868311882, ref_abs_avg=17.86745834350586, test_abs_avg=17.867919921875
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.3971940279006958, max_abs=2.75, mean_rel=0.14867617189884186, max_rel=597.0324096679688, norm_rel=0.022682221606373787, ref_abs_avg=17.48320198059082, test_abs_avg=17.48407745361328
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.3044724464416504, max_abs=1.25, mean_rel=0.06215297430753708, max_rel=2.4995384216308594, norm_rel=0.02135639451444149, ref_abs_avg=14.632768630981445, test_abs_avg=14.62458610534668
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.3883782625198364, max_abs=2.625, mean_rel=0.14141491055488586, max_rel=552.0601196289062, norm_rel=0.022760407999157906, ref_abs_avg=17.062786102294922, test_abs_avg=17.063549041748047
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.37993335723876953, max_abs=2.875, mean_rel=0.1498819887638092, max_rel=676.7317504882812, norm_rel=0.022239122539758682, ref_abs_avg=17.040346145629883, test_abs_avg=17.038623809814453
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3064899444580078, max_abs=1.3125, mean_rel=0.10612370073795319, max_rel=14.647102355957031, norm_rel=0.021711615845561028, ref_abs_avg=13.867818832397461, test_abs_avg=13.877685546875
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.36253222823143005, max_abs=2.5, mean_rel=0.1454089730978012, max_rel=634.2289428710938, norm_rel=0.022338207811117172, ref_abs_avg=16.23541831970215, test_abs_avg=16.234952926635742
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.3588774502277374, max_abs=3.0, mean_rel=0.12940791249275208, max_rel=388.3403015136719, norm_rel=0.02198122814297676, ref_abs_avg=16.338821411132812, test_abs_avg=16.34309959411621
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.2682807445526123, max_abs=1.0, mean_rel=0.09469044953584671, max_rel=4.316679000854492, norm_rel=0.02072479948401451, ref_abs_avg=13.103767395019531, test_abs_avg=13.095135688781738
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3467118442058563, max_abs=2.875, mean_rel=0.14841987192630768, max_rel=761.0726928710938, norm_rel=0.02199532836675644, ref_abs_avg=15.759526252746582, test_abs_avg=15.76060676574707
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.342237651348114, max_abs=2.5, mean_rel=0.13943442702293396, max_rel=472.57940673828125, norm_rel=0.021522143855690956, ref_abs_avg=15.916206359863281, test_abs_avg=15.917848587036133
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.3318655490875244, max_abs=1.125, mean_rel=0.13167497515678406, max_rel=9.671347618103027, norm_rel=0.022764069959521294, ref_abs_avg=14.553180694580078, test_abs_avg=14.555550575256348
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.4039958715438843, max_abs=3.21875, mean_rel=0.15807552635669708, max_rel=993.2877807617188, norm_rel=0.02336624078452587, ref_abs_avg=17.30860710144043, test_abs_avg=17.309680938720703
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3931184411048889, max_abs=2.875, mean_rel=0.13936391472816467, max_rel=421.2428283691406, norm_rel=0.022989636287093163, ref_abs_avg=17.09038543701172, test_abs_avg=17.08795166015625
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.2910640239715576, max_abs=1.125, mean_rel=0.15342549979686737, max_rel=30.779279708862305, norm_rel=0.020416606217622757, ref_abs_avg=14.19080638885498, test_abs_avg=14.220771789550781
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3636869788169861, max_abs=2.875, mean_rel=0.1483413279056549, max_rel=917.4832153320312, norm_rel=0.022666839882731438, ref_abs_avg=16.039751052856445, test_abs_avg=16.04092788696289
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.3548237085342407, max_abs=2.5, mean_rel=0.1492125689983368, max_rel=750.98486328125, norm_rel=0.02247549220919609, ref_abs_avg=15.769672393798828, test_abs_avg=15.768896102905273
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.2758045494556427, max_abs=1.125, mean_rel=0.09355086088180542, max_rel=10.78009033203125, norm_rel=0.02097863145172596, ref_abs_avg=13.444381713867188, test_abs_avg=13.450481414794922
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.34110027551651, max_abs=3.0, mean_rel=0.13879328966140747, max_rel=755.253662109375, norm_rel=0.021805964410305023, ref_abs_avg=15.631427764892578, test_abs_avg=15.630956649780273
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.33046579360961914, max_abs=2.875, mean_rel=0.13359767198562622, max_rel=522.2247924804688, norm_rel=0.021406786516308784, ref_abs_avg=15.410087585449219, test_abs_avg=15.405485153198242
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2511252760887146, max_abs=1.0625, mean_rel=0.09180668741464615, max_rel=23.944976806640625, norm_rel=0.020320646464824677, ref_abs_avg=12.876523971557617, test_abs_avg=12.873344421386719
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.3113956153392792, max_abs=3.0, mean_rel=0.13756480813026428, max_rel=710.2823486328125, norm_rel=0.021268276497721672, ref_abs_avg=14.673133850097656, test_abs_avg=14.673266410827637
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.30650684237480164, max_abs=2.625, mean_rel=0.13212834298610687, max_rel=564.4998779296875, norm_rel=0.020897062495350838, ref_abs_avg=14.724996566772461, test_abs_avg=14.736323356628418
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.251483678817749, max_abs=0.958984375, mean_rel=0.20384268462657928, max_rel=67.16912841796875, norm_rel=0.02120799571275711, ref_abs_avg=12.020120620727539, test_abs_avg=12.007158279418945
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.29900461435317993, max_abs=2.6875, mean_rel=0.12651130557060242, max_rel=393.20648193359375, norm_rel=0.02061782404780388, ref_abs_avg=14.542656898498535, test_abs_avg=14.542927742004395
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.29096272587776184, max_abs=3.0, mean_rel=0.13531970977783203, max_rel=533.8222045898438, norm_rel=0.021074416115880013, ref_abs_avg=14.030677795410156, test_abs_avg=14.035528182983398
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.24140477180480957, max_abs=1.0, mean_rel=0.07571690529584885, max_rel=7.9658894538879395, norm_rel=0.020228561013936996, ref_abs_avg=11.74140739440918, test_abs_avg=11.744962692260742
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2751513123512268, max_abs=2.5, mean_rel=0.1259680986404419, max_rel=715.4011840820312, norm_rel=0.020206443965435028, ref_abs_avg=13.711470603942871, test_abs_avg=13.710586547851562
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.27067628502845764, max_abs=2.75, mean_rel=0.1200704500079155, max_rel=432.9141540527344, norm_rel=0.019863132387399673, ref_abs_avg=13.769708633422852, test_abs_avg=13.76678466796875
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2251981496810913, max_abs=1.0, mean_rel=0.13039323687553406, max_rel=21.44659423828125, norm_rel=0.01939653977751732, ref_abs_avg=11.4403076171875, test_abs_avg=11.44997501373291
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.26736411452293396, max_abs=2.75, mean_rel=0.12425754964351654, max_rel=906.2476196289062, norm_rel=0.01968955248594284, ref_abs_avg=13.735881805419922, test_abs_avg=13.735945701599121
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.2536771297454834, max_abs=2.765625, mean_rel=0.12684428691864014, max_rel=654.520263671875, norm_rel=0.018754377961158752, ref_abs_avg=13.606517791748047, test_abs_avg=13.603270530700684
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.21065902709960938, max_abs=0.875, mean_rel=0.049442682415246964, max_rel=1.588866949081421, norm_rel=0.019267268478870392, ref_abs_avg=11.200263023376465, test_abs_avg=11.202120780944824
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.244613379240036, max_abs=2.5, mean_rel=0.11994640529155731, max_rel=564.3369140625, norm_rel=0.01914401911199093, ref_abs_avg=12.974870681762695, test_abs_avg=12.974075317382812
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.23819153010845184, max_abs=2.75, mean_rel=0.11106238514184952, max_rel=222.43975830078125, norm_rel=0.019215242937207222, ref_abs_avg=12.595630645751953, test_abs_avg=12.601997375488281
liger_forward vs paper_forward output: mean_abs=0.0001522383972769603, max_abs=0.03515625
liger_forward grad[0] vs paper_forward: mean_abs=0.003534650197252631, max_abs=0.1796875, mean_rel=0.025527846068143845, max_rel=49.656341552734375, norm_rel=0.009596342220902443, ref_abs_avg=0.4605426788330078, test_abs_avg=0.46052080392837524
liger_forward grad[1] vs paper_forward: mean_abs=1.5865951776504517, max_abs=16.0, mean_rel=0.040456630289554596, max_rel=58.45310592651367, norm_rel=0.006636442616581917, ref_abs_avg=228.8763427734375, test_abs_avg=228.87448120117188
liger_forward grad[2] vs paper_forward: mean_abs=0.33270931243896484, max_abs=1.375, mean_rel=0.09877032786607742, max_rel=36.120967864990234, norm_rel=0.008364751003682613, ref_abs_avg=40.4653205871582, test_abs_avg=40.47264099121094
liger_forward grad[3] vs paper_forward: mean_abs=0.4011629521846771, max_abs=3.0, mean_rel=0.058865226805210114, max_rel=568.73095703125, norm_rel=0.008837050758302212, ref_abs_avg=47.12229919433594, test_abs_avg=47.122798919677734
liger_forward grad[4] vs paper_forward: mean_abs=0.38931578397750854, max_abs=2.5, mean_rel=0.06141659989953041, max_rel=542.977783203125, norm_rel=0.008671446703374386, ref_abs_avg=46.71839141845703, test_abs_avg=46.72462463378906
liger_forward grad[5] vs paper_forward: mean_abs=0.2780780792236328, max_abs=1.0, mean_rel=0.035953134298324585, max_rel=3.5408599376678467, norm_rel=0.008875560946762562, ref_abs_avg=31.441139221191406, test_abs_avg=31.420209884643555
liger_forward grad[6] vs paper_forward: mean_abs=0.3407415747642517, max_abs=2.25, mean_rel=0.05553003400564194, max_rel=688.3863525390625, norm_rel=0.00858265534043312, ref_abs_avg=41.23273468017578, test_abs_avg=41.231590270996094
liger_forward grad[7] vs paper_forward: mean_abs=0.3272831439971924, max_abs=2.0, mean_rel=0.060235921293497086, max_rel=1122.497314453125, norm_rel=0.008391664363443851, ref_abs_avg=40.63377380371094, test_abs_avg=40.63291931152344
liger_forward grad[8] vs paper_forward: mean_abs=0.2594623565673828, max_abs=1.0, mean_rel=0.024503042921423912, max_rel=1.6348984241485596, norm_rel=0.008321722969412804, ref_abs_avg=32.28358840942383, test_abs_avg=32.28197479248047
liger_forward grad[9] vs paper_forward: mean_abs=0.3073878884315491, max_abs=2.0, mean_rel=0.05898645520210266, max_rel=543.5557250976562, norm_rel=0.00839872658252716, ref_abs_avg=38.07654571533203, test_abs_avg=38.076133728027344
liger_forward grad[10] vs paper_forward: mean_abs=0.29630526900291443, max_abs=1.8125, mean_rel=0.053173720836639404, max_rel=338.1163024902344, norm_rel=0.008233712054789066, ref_abs_avg=37.54872512817383, test_abs_avg=37.54932403564453
liger_forward grad[11] vs paper_forward: mean_abs=0.22725677490234375, max_abs=1.125, mean_rel=0.02414874918758869, max_rel=1.2440474033355713, norm_rel=0.008220852352678776, ref_abs_avg=29.19913101196289, test_abs_avg=29.21146011352539
liger_forward grad[12] vs paper_forward: mean_abs=0.27839067578315735, max_abs=2.0, mean_rel=0.055652350187301636, max_rel=585.530029296875, norm_rel=0.008218277245759964, ref_abs_avg=35.33815002441406, test_abs_avg=35.33821105957031
liger_forward grad[13] vs paper_forward: mean_abs=0.26881325244903564, max_abs=1.6015625, mean_rel=0.05593959987163544, max_rel=501.9793701171875, norm_rel=0.008051680400967598, ref_abs_avg=34.88893508911133, test_abs_avg=34.8892936706543
liger_forward grad[14] vs paper_forward: mean_abs=0.22251510620117188, max_abs=0.8125, mean_rel=0.039749953895807266, max_rel=5.717739105224609, norm_rel=0.008172634989023209, ref_abs_avg=27.74383544921875, test_abs_avg=27.74327850341797
liger_forward grad[15] vs paper_forward: mean_abs=0.25763455033302307, max_abs=1.59375, mean_rel=0.052496764808893204, max_rel=326.7462463378906, norm_rel=0.008068608120083809, ref_abs_avg=33.36769104003906, test_abs_avg=33.366943359375
liger_forward grad[16] vs paper_forward: mean_abs=0.2488066554069519, max_abs=1.5, mean_rel=0.05625364929437637, max_rel=379.0169372558594, norm_rel=0.00789607409387827, ref_abs_avg=33.01008224487305, test_abs_avg=33.01088333129883
liger_forward grad[17] vs paper_forward: mean_abs=0.2137068510055542, max_abs=0.8125, mean_rel=0.0400380901992321, max_rel=2.0083768367767334, norm_rel=0.00874872226268053, ref_abs_avg=24.566938400268555, test_abs_avg=24.574918746948242
liger_forward grad[18] vs paper_forward: mean_abs=0.24236613512039185, max_abs=2.0, mean_rel=0.053107332438230515, max_rel=536.0116577148438, norm_rel=0.007969817146658897, ref_abs_avg=31.79619598388672, test_abs_avg=31.795833587646484
liger_forward grad[19] vs paper_forward: mean_abs=0.23288798332214355, max_abs=1.5, mean_rel=0.0513533353805542, max_rel=244.067626953125, norm_rel=0.007746406830847263, ref_abs_avg=31.49234390258789, test_abs_avg=31.49176025390625
liger_forward grad[20] vs paper_forward: mean_abs=0.18628406524658203, max_abs=0.75, mean_rel=0.02848012000322342, max_rel=1.9299339056015015, norm_rel=0.007614531088620424, ref_abs_avg=25.639265060424805, test_abs_avg=25.642715454101562
liger_forward grad[21] vs paper_forward: mean_abs=0.22459468245506287, max_abs=1.5, mean_rel=0.050795648247003555, max_rel=510.1769104003906, norm_rel=0.0078213419765234, ref_abs_avg=30.079837799072266, test_abs_avg=30.079822540283203
liger_forward grad[22] vs paper_forward: mean_abs=0.2156488299369812, max_abs=1.5, mean_rel=0.04903525114059448, max_rel=309.3995056152344, norm_rel=0.007560747675597668, ref_abs_avg=29.975627899169922, test_abs_avg=29.97736930847168
liger_forward grad[23] vs paper_forward: mean_abs=0.1779322624206543, max_abs=0.6875, mean_rel=0.038014061748981476, max_rel=5.620729446411133, norm_rel=0.00806837435811758, ref_abs_avg=23.20960807800293, test_abs_avg=23.217222213745117
liger_forward grad[24] vs paper_forward: mean_abs=0.21118928492069244, max_abs=1.5, mean_rel=0.04982910305261612, max_rel=287.933349609375, norm_rel=0.007723772432655096, ref_abs_avg=28.689579010009766, test_abs_avg=28.68947982788086
liger_forward grad[25] vs paper_forward: mean_abs=0.2023775577545166, max_abs=1.25, mean_rel=0.052233435213565826, max_rel=211.40892028808594, norm_rel=0.007436092011630535, ref_abs_avg=28.690685272216797, test_abs_avg=28.690914154052734
liger_forward grad[26] vs paper_forward: mean_abs=0.1903211921453476, max_abs=0.8125, mean_rel=0.029561396688222885, max_rel=2.1522305011749268, norm_rel=0.00789396557956934, ref_abs_avg=25.088687896728516, test_abs_avg=25.10018539428711
liger_forward grad[27] vs paper_forward: mean_abs=0.22926746308803558, max_abs=1.5, mean_rel=0.051918499171733856, max_rel=295.552001953125, norm_rel=0.007844281382858753, ref_abs_avg=30.647388458251953, test_abs_avg=30.646739959716797
liger_forward grad[28] vs paper_forward: mean_abs=0.2213975191116333, max_abs=1.5, mean_rel=0.046536222100257874, max_rel=231.59832763671875, norm_rel=0.007648342289030552, ref_abs_avg=30.443103790283203, test_abs_avg=30.44308090209961
liger_forward grad[29] vs paper_forward: mean_abs=0.16908729076385498, max_abs=0.75, mean_rel=0.03163691982626915, max_rel=4.349855422973633, norm_rel=0.00771196186542511, ref_abs_avg=22.716825485229492, test_abs_avg=22.722482681274414
liger_forward grad[30] vs paper_forward: mean_abs=0.20577691495418549, max_abs=1.5, mean_rel=0.050780460238456726, max_rel=400.9855651855469, norm_rel=0.007660390343517065, ref_abs_avg=28.162933349609375, test_abs_avg=28.162254333496094
liger_forward grad[31] vs paper_forward: mean_abs=0.1973240077495575, max_abs=1.25, mean_rel=0.050449639558792114, max_rel=493.1318359375, norm_rel=0.007461695931851864, ref_abs_avg=27.818199157714844, test_abs_avg=27.81798553466797
liger_forward grad[32] vs paper_forward: mean_abs=0.17051076889038086, max_abs=0.75, mean_rel=0.0484495609998703, max_rel=11.963584899902344, norm_rel=0.007881663739681244, ref_abs_avg=22.65468978881836, test_abs_avg=22.673437118530273
liger_forward grad[33] vs paper_forward: mean_abs=0.18822699785232544, max_abs=1.25, mean_rel=0.04559400677680969, max_rel=308.71844482421875, norm_rel=0.007515789475291967, ref_abs_avg=26.30065155029297, test_abs_avg=26.300405502319336
liger_forward grad[34] vs paper_forward: mean_abs=0.1804603934288025, max_abs=1.25, mean_rel=0.0449148491024971, max_rel=242.77330017089844, norm_rel=0.007383790332823992, ref_abs_avg=25.792686462402344, test_abs_avg=25.792518615722656
liger_forward grad[35] vs paper_forward: mean_abs=0.1512131690979004, max_abs=0.53125, mean_rel=0.04097285121679306, max_rel=2.9983458518981934, norm_rel=0.007799986284226179, ref_abs_avg=19.876972198486328, test_abs_avg=19.872331619262695
liger_forward grad[36] vs paper_forward: mean_abs=0.17481961846351624, max_abs=1.1875, mean_rel=0.04920182377099991, max_rel=363.4989929199219, norm_rel=0.00740828737616539, ref_abs_avg=24.794673919677734, test_abs_avg=24.79412078857422
liger_forward grad[37] vs paper_forward: mean_abs=0.16901138424873352, max_abs=1.09375, mean_rel=0.04438244551420212, max_rel=483.618896484375, norm_rel=0.007228213828057051, ref_abs_avg=24.675701141357422, test_abs_avg=24.675270080566406
liger_forward grad[38] vs paper_forward: mean_abs=0.1317596435546875, max_abs=0.5, mean_rel=0.022013211622834206, max_rel=1.4272879362106323, norm_rel=0.007006044965237379, ref_abs_avg=19.830127716064453, test_abs_avg=19.84014129638672
liger_forward grad[39] vs paper_forward: mean_abs=0.1630815863609314, max_abs=1.125, mean_rel=0.04611505568027496, max_rel=271.80145263671875, norm_rel=0.007251072209328413, ref_abs_avg=23.669170379638672, test_abs_avg=23.6693058013916
liger_forward grad[40] vs paper_forward: mean_abs=0.15781962871551514, max_abs=1.0625, mean_rel=0.042180292308330536, max_rel=281.15814208984375, norm_rel=0.007105089258402586, ref_abs_avg=23.465896606445312, test_abs_avg=23.465591430664062
liger_forward grad[41] vs paper_forward: mean_abs=0.12262725830078125, max_abs=0.5, mean_rel=0.024838274344801903, max_rel=1.825596570968628, norm_rel=0.006811067927628756, ref_abs_avg=19.259944915771484, test_abs_avg=19.25402069091797
liger_forward grad[42] vs paper_forward: mean_abs=0.1542644500732422, max_abs=1.25, mean_rel=0.04626069962978363, max_rel=351.34893798828125, norm_rel=0.0071418145671486855, ref_abs_avg=22.7852783203125, test_abs_avg=22.785198211669922
liger_forward grad[43] vs paper_forward: mean_abs=0.1488865613937378, max_abs=1.0, mean_rel=0.042119380086660385, max_rel=166.95855712890625, norm_rel=0.00695232255384326, ref_abs_avg=22.652517318725586, test_abs_avg=22.650955200195312
liger_forward grad[44] vs paper_forward: mean_abs=0.1221923828125, max_abs=0.5, mean_rel=0.024865953251719475, max_rel=1.7984706163406372, norm_rel=0.007163444999605417, ref_abs_avg=17.874767303466797, test_abs_avg=17.87857437133789
liger_forward grad[45] vs paper_forward: mean_abs=0.14571568369865417, max_abs=1.0, mean_rel=0.04586068540811539, max_rel=393.2674255371094, norm_rel=0.007026956882327795, ref_abs_avg=21.885013580322266, test_abs_avg=21.88482093811035
liger_forward grad[46] vs paper_forward: mean_abs=0.14119088649749756, max_abs=1.0, mean_rel=0.041161976754665375, max_rel=175.62449645996094, norm_rel=0.006828535348176956, ref_abs_avg=21.906753540039062, test_abs_avg=21.905986785888672
liger_forward grad[47] vs paper_forward: mean_abs=0.12008380889892578, max_abs=0.5, mean_rel=0.02446146123111248, max_rel=0.7982969880104065, norm_rel=0.0070416247472167015, ref_abs_avg=17.578731536865234, test_abs_avg=17.577003479003906
liger_forward grad[48] vs paper_forward: mean_abs=0.13865183293819427, max_abs=1.0, mean_rel=0.04322420060634613, max_rel=348.0379943847656, norm_rel=0.006895722821354866, ref_abs_avg=21.270282745361328, test_abs_avg=21.271041870117188
liger_forward grad[49] vs paper_forward: mean_abs=0.133865624666214, max_abs=1.0, mean_rel=0.038181886076927185, max_rel=201.28892517089844, norm_rel=0.0066452790051698685, ref_abs_avg=21.40924835205078, test_abs_avg=21.40912628173828
liger_forward grad[50] vs paper_forward: mean_abs=0.1271120309829712, max_abs=0.5, mean_rel=0.0663134828209877, max_rel=16.21284294128418, norm_rel=0.007439655717462301, ref_abs_avg=18.148914337158203, test_abs_avg=18.157569885253906
liger_forward grad[51] vs paper_forward: mean_abs=0.15714702010154724, max_abs=1.0, mean_rel=0.047204017639160156, max_rel=202.4213104248047, norm_rel=0.007558122742921114, ref_abs_avg=21.814889907836914, test_abs_avg=21.815799713134766
liger_forward grad[52] vs paper_forward: mean_abs=0.15156657993793488, max_abs=1.0, mean_rel=0.048365019261837006, max_rel=237.83767700195312, norm_rel=0.007345276884734631, ref_abs_avg=21.705650329589844, test_abs_avg=21.706424713134766
liger_forward grad[53] vs paper_forward: mean_abs=0.12306451797485352, max_abs=0.5, mean_rel=0.029406292364001274, max_rel=3.937138557434082, norm_rel=0.007399122696369886, ref_abs_avg=16.960615158081055, test_abs_avg=16.96065902709961
liger_forward grad[54] vs paper_forward: mean_abs=0.14381566643714905, max_abs=1.125, mean_rel=0.04599159210920334, max_rel=222.99301147460938, norm_rel=0.007404065225273371, ref_abs_avg=20.405319213867188, test_abs_avg=20.40500831604004
liger_forward grad[55] vs paper_forward: mean_abs=0.13814735412597656, max_abs=1.0, mean_rel=0.04707614704966545, max_rel=284.5321044921875, norm_rel=0.007248712237924337, ref_abs_avg=20.11153221130371, test_abs_avg=20.110809326171875
liger_forward grad[56] vs paper_forward: mean_abs=0.11170603334903717, max_abs=0.5, mean_rel=0.027218451723456383, max_rel=1.2304298877716064, norm_rel=0.007287259213626385, ref_abs_avg=16.103778839111328, test_abs_avg=16.111248016357422
liger_forward grad[57] vs paper_forward: mean_abs=0.13120484352111816, max_abs=1.0, mean_rel=0.043583065271377563, max_rel=201.2388153076172, norm_rel=0.007176559884101152, ref_abs_avg=19.248184204101562, test_abs_avg=19.248676300048828
liger_forward grad[58] vs paper_forward: mean_abs=0.12688925862312317, max_abs=1.0, mean_rel=0.04088906943798065, max_rel=225.50827026367188, norm_rel=0.007065738085657358, ref_abs_avg=18.991424560546875, test_abs_avg=18.992168426513672
liger_forward grad[59] vs paper_forward: mean_abs=0.10423719882965088, max_abs=0.4375, mean_rel=0.05515643209218979, max_rel=17.93828010559082, norm_rel=0.007330181542783976, ref_abs_avg=15.236005783081055, test_abs_avg=15.23768424987793
liger_forward grad[60] vs paper_forward: mean_abs=0.1222103014588356, max_abs=1.0, mean_rel=0.043188612908124924, max_rel=209.824462890625, norm_rel=0.007005849853157997, ref_abs_avg=18.378768920898438, test_abs_avg=18.379146575927734
liger_forward grad[61] vs paper_forward: mean_abs=0.11874428391456604, max_abs=1.0, mean_rel=0.04541021212935448, max_rel=187.3075714111328, norm_rel=0.006873940583318472, ref_abs_avg=18.29613494873047, test_abs_avg=18.296092987060547
liger_forward grad[62] vs paper_forward: mean_abs=0.09856146574020386, max_abs=0.5, mean_rel=0.03131742402911186, max_rel=1.5341448783874512, norm_rel=0.007189551834017038, ref_abs_avg=14.333507537841797, test_abs_avg=14.337727546691895
liger_forward grad[63] vs paper_forward: mean_abs=0.11684460937976837, max_abs=1.0, mean_rel=0.04454757273197174, max_rel=274.42645263671875, norm_rel=0.006893783342093229, ref_abs_avg=17.86745834350586, test_abs_avg=17.86739158630371
liger_forward grad[64] vs paper_forward: mean_abs=0.11226916313171387, max_abs=1.0, mean_rel=0.041465215384960175, max_rel=161.3848419189453, norm_rel=0.006797593552619219, ref_abs_avg=17.48320198059082, test_abs_avg=17.482128143310547
liger_forward grad[65] vs paper_forward: mean_abs=0.09186935424804688, max_abs=0.375, mean_rel=0.021752968430519104, max_rel=1.0738013982772827, norm_rel=0.006890262011438608, ref_abs_avg=14.632768630981445, test_abs_avg=14.629287719726562
liger_forward grad[66] vs paper_forward: mean_abs=0.10938261449337006, max_abs=1.0, mean_rel=0.040808167308568954, max_rel=149.3866729736328, norm_rel=0.006789098493754864, ref_abs_avg=17.062786102294922, test_abs_avg=17.063068389892578
liger_forward grad[67] vs paper_forward: mean_abs=0.10584458708763123, max_abs=1.0, mean_rel=0.040151242166757584, max_rel=126.48419189453125, norm_rel=0.006593689322471619, ref_abs_avg=17.040346145629883, test_abs_avg=17.03868293762207
liger_forward grad[68] vs paper_forward: mean_abs=0.08749938011169434, max_abs=0.375, mean_rel=0.03021765500307083, max_rel=5.589076042175293, norm_rel=0.006659899838268757, ref_abs_avg=13.867818832397461, test_abs_avg=13.871932983398438
liger_forward grad[69] vs paper_forward: mean_abs=0.10247239470481873, max_abs=1.0, mean_rel=0.040202073752880096, max_rel=125.5675277709961, norm_rel=0.006698993965983391, ref_abs_avg=16.23541831970215, test_abs_avg=16.235620498657227
liger_forward grad[70] vs paper_forward: mean_abs=0.0997757613658905, max_abs=1.0, mean_rel=0.036669448018074036, max_rel=114.26766204833984, norm_rel=0.006499137729406357, ref_abs_avg=16.338821411132812, test_abs_avg=16.339384078979492
liger_forward grad[71] vs paper_forward: mean_abs=0.07904219627380371, max_abs=0.3125, mean_rel=0.03939007595181465, max_rel=4.521397590637207, norm_rel=0.0064654890447855, ref_abs_avg=13.103767395019531, test_abs_avg=13.103862762451172
liger_forward grad[72] vs paper_forward: mean_abs=0.09804484248161316, max_abs=1.0, mean_rel=0.040388792753219604, max_rel=283.4601745605469, norm_rel=0.006603009067475796, ref_abs_avg=15.759526252746582, test_abs_avg=15.759693145751953
liger_forward grad[73] vs paper_forward: mean_abs=0.09570157527923584, max_abs=0.6875, mean_rel=0.039432063698768616, max_rel=201.9619140625, norm_rel=0.006398226600140333, ref_abs_avg=15.916206359863281, test_abs_avg=15.914534568786621
liger_forward grad[74] vs paper_forward: mean_abs=0.09707140922546387, max_abs=0.40625, mean_rel=0.030186861753463745, max_rel=1.4954501390457153, norm_rel=0.006981079466640949, ref_abs_avg=14.553180694580078, test_abs_avg=14.555734634399414
liger_forward grad[75] vs paper_forward: mean_abs=0.11902986466884613, max_abs=1.0, mean_rel=0.04489591717720032, max_rel=189.07586669921875, norm_rel=0.007229201029986143, ref_abs_avg=17.30860710144043, test_abs_avg=17.309036254882812
liger_forward grad[76] vs paper_forward: mean_abs=0.11374826729297638, max_abs=1.0, mean_rel=0.04141247272491455, max_rel=168.0120086669922, norm_rel=0.007040572352707386, ref_abs_avg=17.09038543701172, test_abs_avg=17.08930206298828
liger_forward grad[77] vs paper_forward: mean_abs=0.08939218521118164, max_abs=0.375, mean_rel=0.04535753279924393, max_rel=9.145403861999512, norm_rel=0.006649894639849663, ref_abs_avg=14.19080638885498, test_abs_avg=14.18665599822998
liger_forward grad[78] vs paper_forward: mean_abs=0.10632479935884476, max_abs=0.75, mean_rel=0.04190201312303543, max_rel=164.1326141357422, norm_rel=0.006985188461840153, ref_abs_avg=16.039751052856445, test_abs_avg=16.039527893066406
liger_forward grad[79] vs paper_forward: mean_abs=0.10266120731830597, max_abs=1.0, mean_rel=0.04067620635032654, max_rel=119.42753601074219, norm_rel=0.006902087479829788, ref_abs_avg=15.769672393798828, test_abs_avg=15.768291473388672
liger_forward grad[80] vs paper_forward: mean_abs=0.0940418541431427, max_abs=0.375, mean_rel=0.05002359300851822, max_rel=16.234575271606445, norm_rel=0.007254306226968765, ref_abs_avg=13.444381713867188, test_abs_avg=13.441831588745117
liger_forward grad[81] vs paper_forward: mean_abs=0.09930972754955292, max_abs=1.0, mean_rel=0.040212828665971756, max_rel=176.35736083984375, norm_rel=0.006723560858517885, ref_abs_avg=15.631427764892578, test_abs_avg=15.631595611572266
liger_forward grad[82] vs paper_forward: mean_abs=0.09454728662967682, max_abs=1.0, mean_rel=0.03651037812232971, max_rel=70.49148559570312, norm_rel=0.0065210494212806225, ref_abs_avg=15.410087585449219, test_abs_avg=15.409838676452637
liger_forward grad[83] vs paper_forward: mean_abs=0.0760001540184021, max_abs=0.375, mean_rel=0.04794173687696457, max_rel=14.780909538269043, norm_rel=0.0064511988312006, ref_abs_avg=12.876523971557617, test_abs_avg=12.873773574829102
liger_forward grad[84] vs paper_forward: mean_abs=0.09022773802280426, max_abs=1.0, mean_rel=0.03855636715888977, max_rel=157.9908447265625, norm_rel=0.006539083085954189, ref_abs_avg=14.673133850097656, test_abs_avg=14.673199653625488
liger_forward grad[85] vs paper_forward: mean_abs=0.08892916142940521, max_abs=1.0, mean_rel=0.03894735500216484, max_rel=196.8781280517578, norm_rel=0.006457043346017599, ref_abs_avg=14.724996566772461, test_abs_avg=14.725611686706543
liger_forward grad[86] vs paper_forward: mean_abs=0.06695884466171265, max_abs=0.25, mean_rel=0.02225644886493683, max_rel=2.0797781944274902, norm_rel=0.006052768789231777, ref_abs_avg=12.020120620727539, test_abs_avg=12.015933990478516
liger_forward grad[87] vs paper_forward: mean_abs=0.08605039864778519, max_abs=1.0, mean_rel=0.036648910492658615, max_rel=283.0250244140625, norm_rel=0.006325945723801851, ref_abs_avg=14.542656898498535, test_abs_avg=14.542929649353027
liger_forward grad[88] vs paper_forward: mean_abs=0.0821191817522049, max_abs=1.0, mean_rel=0.035993918776512146, max_rel=107.20066833496094, norm_rel=0.006334442645311356, ref_abs_avg=14.030677795410156, test_abs_avg=14.029720306396484
liger_forward grad[89] vs paper_forward: mean_abs=0.07122421264648438, max_abs=0.28125, mean_rel=0.025446448475122452, max_rel=2.85636568069458, norm_rel=0.006461964454501867, ref_abs_avg=11.74140739440918, test_abs_avg=11.735528945922852
liger_forward grad[90] vs paper_forward: mean_abs=0.07961557805538177, max_abs=0.875, mean_rel=0.036183346062898636, max_rel=214.90638732910156, norm_rel=0.0062524727545678616, ref_abs_avg=13.711470603942871, test_abs_avg=13.711551666259766
liger_forward grad[91] vs paper_forward: mean_abs=0.07795874774456024, max_abs=0.75, mean_rel=0.03406573086977005, max_rel=92.28766632080078, norm_rel=0.006116311531513929, ref_abs_avg=13.769708633422852, test_abs_avg=13.77169418334961
liger_forward grad[92] vs paper_forward: mean_abs=0.06103566288948059, max_abs=0.25, mean_rel=0.04010632634162903, max_rel=5.598191738128662, norm_rel=0.005855313967913389, ref_abs_avg=11.4403076171875, test_abs_avg=11.445605278015137
liger_forward grad[93] vs paper_forward: mean_abs=0.07604697346687317, max_abs=0.75, mean_rel=0.03421253338456154, max_rel=156.8206024169922, norm_rel=0.006022916175425053, ref_abs_avg=13.735881805419922, test_abs_avg=13.735986709594727
liger_forward grad[94] vs paper_forward: mean_abs=0.07320617139339447, max_abs=1.0, mean_rel=0.03399365022778511, max_rel=103.70689392089844, norm_rel=0.005861701909452677, ref_abs_avg=13.606517791748047, test_abs_avg=13.607023239135742
liger_forward grad[95] vs paper_forward: mean_abs=0.05976390838623047, max_abs=0.25, mean_rel=0.015955530107021332, max_rel=0.8126963376998901, norm_rel=0.005846066400408745, ref_abs_avg=11.200263023376465, test_abs_avg=11.204538345336914
liger_forward grad[96] vs paper_forward: mean_abs=0.06970641762018204, max_abs=1.0, mean_rel=0.03375844284892082, max_rel=148.4876708984375, norm_rel=0.005887703504413366, ref_abs_avg=12.974870681762695, test_abs_avg=12.974990844726562
liger_forward grad[97] vs paper_forward: mean_abs=0.06765181571245193, max_abs=0.75, mean_rel=0.03297031670808792, max_rel=88.58012390136719, norm_rel=0.005911723244935274, ref_abs_avg=12.595630645751953, test_abs_avg=12.596646308898926
identity layers + randn queries
liger_forward fwd+bwd:  54.439 ms
liger_forward bwd-only: 42.029 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.775 GiB, fwd+bwd=8.088 GiB
torch_compile_phases_forward fwd+bwd:  48.548 ms
torch_compile_phases_forward bwd-only: 39.349 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.754 GiB
paper_forward fwd+bwd:  112.802 ms
paper_forward bwd-only: 88.939 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB
production_forward fwd+bwd:  33.805 ms
production_forward bwd-only: 28.878 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.244 GiB, fwd+bwd=5.244 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.00168109149672091, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008574072271585464, max_abs=0.4814453125, mean_rel=0.07296925038099289, max_rel=107.34735870361328, norm_rel=0.01992053911089897, ref_abs_avg=0.4668831527233124, test_abs_avg=0.46690595149993896
production_forward grad[1] vs paper_forward: mean_abs=5.23793888092041, max_abs=48.0, mean_rel=0.16701942682266235, max_rel=202.06687927246094, norm_rel=0.020716428756713867, ref_abs_avg=228.2950897216797, test_abs_avg=228.2538604736328
production_forward grad[2] vs paper_forward: mean_abs=0.8869075775146484, max_abs=3.359375, mean_rel=0.07041102647781372, max_rel=3.651444673538208, norm_rel=0.02176671288907528, ref_abs_avg=40.53485870361328, test_abs_avg=40.50984573364258
production_forward grad[3] vs paper_forward: mean_abs=1.0908143520355225, max_abs=8.5, mean_rel=0.15386728942394257, max_rel=1599.62255859375, norm_rel=0.022875672206282616, ref_abs_avg=47.95964431762695, test_abs_avg=47.96259307861328
production_forward grad[4] vs paper_forward: mean_abs=1.0667277574539185, max_abs=6.75, mean_rel=0.14616899192333221, max_rel=635.6431884765625, norm_rel=0.022547060623764992, ref_abs_avg=47.58937072753906, test_abs_avg=47.58713150024414
production_forward grad[5] vs paper_forward: mean_abs=0.7069294452667236, max_abs=2.5, mean_rel=0.11574467271566391, max_rel=14.84168815612793, norm_rel=0.021508701145648956, ref_abs_avg=32.12895965576172, test_abs_avg=32.177146911621094
production_forward grad[6] vs paper_forward: mean_abs=0.9513707756996155, max_abs=6.25, mean_rel=0.1607317477464676, max_rel=1548.6275634765625, norm_rel=0.022632598876953125, ref_abs_avg=42.23849868774414, test_abs_avg=42.24324035644531
production_forward grad[7] vs paper_forward: mean_abs=0.9245476722717285, max_abs=5.5, mean_rel=0.13943631947040558, max_rel=468.39410400390625, norm_rel=0.022236984223127365, ref_abs_avg=41.77604675292969, test_abs_avg=41.774375915527344
production_forward grad[8] vs paper_forward: mean_abs=0.6963005065917969, max_abs=3.375, mean_rel=0.07878981530666351, max_rel=5.149198055267334, norm_rel=0.021349648013710976, ref_abs_avg=33.07938766479492, test_abs_avg=33.0503044128418
production_forward grad[9] vs paper_forward: mean_abs=0.870086669921875, max_abs=5.5, mean_rel=0.15134090185165405, max_rel=1012.7249145507812, norm_rel=0.022405143827199936, ref_abs_avg=38.99879837036133, test_abs_avg=39.002479553222656
production_forward grad[10] vs paper_forward: mean_abs=0.8484147787094116, max_abs=5.1875, mean_rel=0.1643742024898529, max_rel=2956.19189453125, norm_rel=0.022195080295205116, ref_abs_avg=38.46556091308594, test_abs_avg=38.464111328125
production_forward grad[11] vs paper_forward: mean_abs=0.6530580520629883, max_abs=2.5, mean_rel=0.091228187084198, max_rel=8.114026069641113, norm_rel=0.022191286087036133, ref_abs_avg=29.76634407043457, test_abs_avg=29.813304901123047
production_forward grad[12] vs paper_forward: mean_abs=0.8003144264221191, max_abs=5.0, mean_rel=0.16511864960193634, max_rel=1733.1614990234375, norm_rel=0.022459393367171288, ref_abs_avg=35.787044525146484, test_abs_avg=35.78915786743164
production_forward grad[13] vs paper_forward: mean_abs=0.7785618901252747, max_abs=5.0, mean_rel=0.165500670671463, max_rel=1886.179931640625, norm_rel=0.02203315682709217, ref_abs_avg=35.511207580566406, test_abs_avg=35.51136779785156
production_forward grad[14] vs paper_forward: mean_abs=0.6106035709381104, max_abs=2.0, mean_rel=0.08208838105201721, max_rel=7.469810962677002, norm_rel=0.023057663813233376, ref_abs_avg=26.81689453125, test_abs_avg=26.84493064880371
production_forward grad[15] vs paper_forward: mean_abs=0.7423108816146851, max_abs=4.5, mean_rel=0.1479645073413849, max_rel=791.8045043945312, norm_rel=0.022181116044521332, ref_abs_avg=33.61362075805664, test_abs_avg=33.6165771484375
production_forward grad[16] vs paper_forward: mean_abs=0.724140465259552, max_abs=4.25, mean_rel=0.14010152220726013, max_rel=798.068359375, norm_rel=0.021919669583439827, ref_abs_avg=33.20844650268555, test_abs_avg=33.209205627441406
production_forward grad[17] vs paper_forward: mean_abs=0.5236797332763672, max_abs=2.25, mean_rel=0.08409285545349121, max_rel=9.36928653717041, norm_rel=0.02096565254032612, ref_abs_avg=25.39047622680664, test_abs_avg=25.383625030517578
production_forward grad[18] vs paper_forward: mean_abs=0.6986002922058105, max_abs=4.25, mean_rel=0.14130447804927826, max_rel=1224.87646484375, norm_rel=0.022049959748983383, ref_abs_avg=31.80097770690918, test_abs_avg=31.804523468017578
production_forward grad[19] vs paper_forward: mean_abs=0.6863285303115845, max_abs=4.0, mean_rel=0.14389309287071228, max_rel=628.1697998046875, norm_rel=0.021829908713698387, ref_abs_avg=31.59693145751953, test_abs_avg=31.598098754882812
production_forward grad[20] vs paper_forward: mean_abs=0.54831862449646, max_abs=2.25, mean_rel=0.45094233751296997, max_rel=186.42149353027344, norm_rel=0.02182079292833805, ref_abs_avg=25.217308044433594, test_abs_avg=25.169097900390625
production_forward grad[21] vs paper_forward: mean_abs=0.6683887839317322, max_abs=4.02734375, mean_rel=0.14702492952346802, max_rel=1483.908935546875, norm_rel=0.022030392661690712, ref_abs_avg=30.483421325683594, test_abs_avg=30.48739242553711
production_forward grad[22] vs paper_forward: mean_abs=0.6522486209869385, max_abs=4.0, mean_rel=0.15063902735710144, max_rel=887.7826538085938, norm_rel=0.021691085770726204, ref_abs_avg=30.206911087036133, test_abs_avg=30.204256057739258
production_forward grad[23] vs paper_forward: mean_abs=0.5085821151733398, max_abs=2.1875, mean_rel=0.09522693604230881, max_rel=12.835071563720703, norm_rel=0.021709825843572617, ref_abs_avg=24.002460479736328, test_abs_avg=24.014209747314453
production_forward grad[24] vs paper_forward: mean_abs=0.6384824514389038, max_abs=4.0, mean_rel=0.14531970024108887, max_rel=1107.2962646484375, norm_rel=0.0218320582062006, ref_abs_avg=29.35013771057129, test_abs_avg=29.3524112701416
production_forward grad[25] vs paper_forward: mean_abs=0.621940016746521, max_abs=3.796875, mean_rel=0.14958912134170532, max_rel=1263.4736328125, norm_rel=0.02151733636856079, ref_abs_avg=29.056072235107422, test_abs_avg=29.05864143371582
production_forward grad[26] vs paper_forward: mean_abs=0.6222596168518066, max_abs=2.625, mean_rel=0.23418574035167694, max_rel=49.771331787109375, norm_rel=0.0244008619338274, ref_abs_avg=25.077348709106445, test_abs_avg=25.063243865966797
production_forward grad[27] vs paper_forward: mean_abs=0.7479664087295532, max_abs=5.5, mean_rel=0.15764909982681274, max_rel=1190.226806640625, norm_rel=0.023753859102725983, ref_abs_avg=31.624465942382812, test_abs_avg=31.62710189819336
production_forward grad[28] vs paper_forward: mean_abs=0.7310042381286621, max_abs=4.3125, mean_rel=0.15647809207439423, max_rel=906.3453979492188, norm_rel=0.02365405671298504, ref_abs_avg=31.048450469970703, test_abs_avg=31.045190811157227
production_forward grad[29] vs paper_forward: mean_abs=0.5949125289916992, max_abs=2.5625, mean_rel=0.13720566034317017, max_rel=14.565197944641113, norm_rel=0.0253599863499403, ref_abs_avg=24.086172103881836, test_abs_avg=24.133358001708984
production_forward grad[30] vs paper_forward: mean_abs=0.6954125761985779, max_abs=5.25, mean_rel=0.16116423904895782, max_rel=1070.290771484375, norm_rel=0.02408040128648281, ref_abs_avg=28.993606567382812, test_abs_avg=28.998046875
production_forward grad[31] vs paper_forward: mean_abs=0.679894208908081, max_abs=4.5, mean_rel=0.16585269570350647, max_rel=939.6921997070312, norm_rel=0.023738587275147438, ref_abs_avg=28.75722885131836, test_abs_avg=28.760360717773438
production_forward grad[32] vs paper_forward: mean_abs=0.5342750549316406, max_abs=2.25, mean_rel=0.20810244977474213, max_rel=28.45186996459961, norm_rel=0.025966158136725426, ref_abs_avg=20.723339080810547, test_abs_avg=20.73740005493164
production_forward grad[33] vs paper_forward: mean_abs=0.6352716088294983, max_abs=4.125, mean_rel=0.1528836339712143, max_rel=856.7762451171875, norm_rel=0.02394481562077999, ref_abs_avg=26.590370178222656, test_abs_avg=26.596553802490234
production_forward grad[34] vs paper_forward: mean_abs=0.6308916807174683, max_abs=4.375, mean_rel=0.18058258295059204, max_rel=2134.14306640625, norm_rel=0.023916885256767273, ref_abs_avg=26.421085357666016, test_abs_avg=26.425230026245117
production_forward grad[35] vs paper_forward: mean_abs=0.48142337799072266, max_abs=1.875, mean_rel=0.16724848747253418, max_rel=12.28480339050293, norm_rel=0.02280324138700962, ref_abs_avg=21.205181121826172, test_abs_avg=21.22841453552246
production_forward grad[36] vs paper_forward: mean_abs=0.6042397022247314, max_abs=4.5, mean_rel=0.1587960124015808, max_rel=1001.6351928710938, norm_rel=0.02376951463520527, ref_abs_avg=25.492572784423828, test_abs_avg=25.49482536315918
production_forward grad[37] vs paper_forward: mean_abs=0.5925073027610779, max_abs=4.0, mean_rel=0.14724145829677582, max_rel=550.4093627929688, norm_rel=0.02375546656548977, ref_abs_avg=25.033405303955078, test_abs_avg=25.032506942749023
production_forward grad[38] vs paper_forward: mean_abs=0.4708280563354492, max_abs=1.8125, mean_rel=0.2561317980289459, max_rel=84.89624786376953, norm_rel=0.024075647816061974, ref_abs_avg=19.618146896362305, test_abs_avg=19.59768295288086
production_forward grad[39] vs paper_forward: mean_abs=0.5676924586296082, max_abs=3.796875, mean_rel=0.15799300372600555, max_rel=985.4728393554688, norm_rel=0.023483332246541977, ref_abs_avg=24.238834381103516, test_abs_avg=24.241884231567383
production_forward grad[40] vs paper_forward: mean_abs=0.5494695901870728, max_abs=3.421875, mean_rel=0.15759539604187012, max_rel=811.0665283203125, norm_rel=0.023071931675076485, ref_abs_avg=23.84798240661621, test_abs_avg=23.847503662109375
production_forward grad[41] vs paper_forward: mean_abs=0.4306304454803467, max_abs=1.5, mean_rel=0.12292982637882233, max_rel=15.26295280456543, norm_rel=0.023082882165908813, ref_abs_avg=18.237451553344727, test_abs_avg=18.236833572387695
production_forward grad[42] vs paper_forward: mean_abs=0.5291920304298401, max_abs=3.5, mean_rel=0.1545741856098175, max_rel=856.6722412109375, norm_rel=0.023398462682962418, ref_abs_avg=22.6619930267334, test_abs_avg=22.665058135986328
production_forward grad[43] vs paper_forward: mean_abs=0.5277177095413208, max_abs=3.5, mean_rel=0.14244985580444336, max_rel=923.5548706054688, norm_rel=0.023353038355708122, ref_abs_avg=22.652517318725586, test_abs_avg=22.65700912475586
production_forward grad[44] vs paper_forward: mean_abs=0.41023755073547363, max_abs=1.5, mean_rel=0.49690914154052734, max_rel=192.63449096679688, norm_rel=0.02415912225842476, ref_abs_avg=17.121505737304688, test_abs_avg=17.165868759155273
production_forward grad[45] vs paper_forward: mean_abs=0.5075511336326599, max_abs=3.5, mean_rel=0.147172212600708, max_rel=1422.5733642578125, norm_rel=0.02306387946009636, ref_abs_avg=22.041044235229492, test_abs_avg=22.043960571289062
production_forward grad[46] vs paper_forward: mean_abs=0.4958646297454834, max_abs=3.75, mean_rel=0.14969421923160553, max_rel=980.4886474609375, norm_rel=0.022752825170755386, ref_abs_avg=21.825464248657227, test_abs_avg=21.82984161376953
production_forward grad[47] vs paper_forward: mean_abs=0.3949422836303711, max_abs=1.59765625, mean_rel=0.104546457529068, max_rel=6.420482635498047, norm_rel=0.022286316379904747, ref_abs_avg=17.379783630371094, test_abs_avg=17.405534744262695
production_forward grad[48] vs paper_forward: mean_abs=0.4843030571937561, max_abs=3.25, mean_rel=0.14657889306545258, max_rel=719.9827270507812, norm_rel=0.022811751812696457, ref_abs_avg=21.261371612548828, test_abs_avg=21.264453887939453
production_forward grad[49] vs paper_forward: mean_abs=0.4773087501525879, max_abs=3.5, mean_rel=0.1483747363090515, max_rel=1165.618408203125, norm_rel=0.022701088339090347, ref_abs_avg=21.071456909179688, test_abs_avg=21.070283889770508
production_forward grad[50] vs paper_forward: mean_abs=0.4631834030151367, max_abs=2.0, mean_rel=0.09065680205821991, max_rel=5.201590538024902, norm_rel=0.025967519730329514, ref_abs_avg=17.632322311401367, test_abs_avg=17.594051361083984
production_forward grad[51] vs paper_forward: mean_abs=0.5421870946884155, max_abs=3.75, mean_rel=0.16628140211105347, max_rel=1418.177734375, norm_rel=0.024339107796549797, ref_abs_avg=22.346303939819336, test_abs_avg=22.34827423095703
production_forward grad[52] vs paper_forward: mean_abs=0.5333936214447021, max_abs=3.310546875, mean_rel=0.15697437524795532, max_rel=809.6787719726562, norm_rel=0.024212080985307693, ref_abs_avg=22.09911346435547, test_abs_avg=22.09676742553711
production_forward grad[53] vs paper_forward: mean_abs=0.4327867031097412, max_abs=1.625, mean_rel=0.11999236792325974, max_rel=7.0794196128845215, norm_rel=0.025770485401153564, ref_abs_avg=16.824047088623047, test_abs_avg=16.830970764160156
production_forward grad[54] vs paper_forward: mean_abs=0.4979902505874634, max_abs=3.25, mean_rel=0.14516004920005798, max_rel=816.2911987304688, norm_rel=0.023844748735427856, ref_abs_avg=20.887598037719727, test_abs_avg=20.888477325439453
production_forward grad[55] vs paper_forward: mean_abs=0.48621493577957153, max_abs=3.1103515625, mean_rel=0.15395808219909668, max_rel=688.4654541015625, norm_rel=0.023787569254636765, ref_abs_avg=20.467557907104492, test_abs_avg=20.468812942504883
production_forward grad[56] vs paper_forward: mean_abs=0.3815915584564209, max_abs=1.375, mean_rel=0.0970786064863205, max_rel=5.3660454750061035, norm_rel=0.023780331015586853, ref_abs_avg=16.2299747467041, test_abs_avg=16.194639205932617
production_forward grad[57] vs paper_forward: mean_abs=0.4622916579246521, max_abs=3.5, mean_rel=0.15136167407035828, max_rel=606.1954345703125, norm_rel=0.023388182744383812, ref_abs_avg=19.78174591064453, test_abs_avg=19.78258514404297
production_forward grad[58] vs paper_forward: mean_abs=0.45381590723991394, max_abs=3.125, mean_rel=0.15475419163703918, max_rel=677.034423828125, norm_rel=0.02320370450615883, ref_abs_avg=19.574186325073242, test_abs_avg=19.570476531982422
production_forward grad[59] vs paper_forward: mean_abs=0.3539239168167114, max_abs=1.25, mean_rel=0.07710732519626617, max_rel=2.7997994422912598, norm_rel=0.022267291322350502, ref_abs_avg=15.888612747192383, test_abs_avg=15.915769577026367
production_forward grad[60] vs paper_forward: mean_abs=0.43325477838516235, max_abs=2.875, mean_rel=0.14275720715522766, max_rel=637.2179565429688, norm_rel=0.023052671924233437, ref_abs_avg=18.781192779541016, test_abs_avg=18.78221321105957
production_forward grad[61] vs paper_forward: mean_abs=0.42677223682403564, max_abs=3.125, mean_rel=0.15538650751113892, max_rel=664.1469116210938, norm_rel=0.02272033877670765, ref_abs_avg=18.771984100341797, test_abs_avg=18.773426055908203
production_forward grad[62] vs paper_forward: mean_abs=0.3263247013092041, max_abs=1.75, mean_rel=0.07024308294057846, max_rel=3.258396863937378, norm_rel=0.02283855900168419, ref_abs_avg=14.719035148620605, test_abs_avg=14.719254493713379
production_forward grad[63] vs paper_forward: mean_abs=0.4135950207710266, max_abs=3.0, mean_rel=0.13920524716377258, max_rel=713.07470703125, norm_rel=0.022503042593598366, ref_abs_avg=18.357707977294922, test_abs_avg=18.359981536865234
production_forward grad[64] vs paper_forward: mean_abs=0.40100985765457153, max_abs=3.25, mean_rel=0.13346436619758606, max_rel=506.3843688964844, norm_rel=0.022349700331687927, ref_abs_avg=17.94100570678711, test_abs_avg=17.941011428833008
production_forward grad[65] vs paper_forward: mean_abs=0.3329591751098633, max_abs=1.625, mean_rel=0.3631158471107483, max_rel=98.43681335449219, norm_rel=0.023130668327212334, ref_abs_avg=14.823203086853027, test_abs_avg=14.84102725982666
production_forward grad[66] vs paper_forward: mean_abs=0.39393186569213867, max_abs=2.734375, mean_rel=0.13823804259300232, max_rel=644.8538818359375, norm_rel=0.022125205025076866, ref_abs_avg=17.759254455566406, test_abs_avg=17.76096534729004
production_forward grad[67] vs paper_forward: mean_abs=0.3836841583251953, max_abs=2.875, mean_rel=0.14447394013404846, max_rel=675.5371704101562, norm_rel=0.02204042486846447, ref_abs_avg=17.407642364501953, test_abs_avg=17.41197967529297
production_forward grad[68] vs paper_forward: mean_abs=0.3055715560913086, max_abs=1.25, mean_rel=0.09136813879013062, max_rel=5.912243843078613, norm_rel=0.02183806337416172, ref_abs_avg=14.07787036895752, test_abs_avg=14.07681655883789
production_forward grad[69] vs paper_forward: mean_abs=0.3740918040275574, max_abs=3.0, mean_rel=0.1415938436985016, max_rel=437.2269592285156, norm_rel=0.021791312843561172, ref_abs_avg=17.164657592773438, test_abs_avg=17.165660858154297
production_forward grad[70] vs paper_forward: mean_abs=0.3633050322532654, max_abs=2.71875, mean_rel=0.13619008660316467, max_rel=720.9561157226562, norm_rel=0.021800972521305084, ref_abs_avg=16.679357528686523, test_abs_avg=16.682037353515625
production_forward grad[71] vs paper_forward: mean_abs=0.2975360155105591, max_abs=1.078125, mean_rel=0.19301918148994446, max_rel=63.591400146484375, norm_rel=0.021268632262945175, ref_abs_avg=13.896018981933594, test_abs_avg=13.900838851928711
production_forward grad[72] vs paper_forward: mean_abs=0.3555297553539276, max_abs=2.75, mean_rel=0.14070498943328857, max_rel=1127.51171875, norm_rel=0.021611005067825317, ref_abs_avg=16.41591453552246, test_abs_avg=16.41655158996582
production_forward grad[73] vs paper_forward: mean_abs=0.3501824736595154, max_abs=2.75, mean_rel=0.14109686017036438, max_rel=422.8034362792969, norm_rel=0.021394863724708557, ref_abs_avg=16.320755004882812, test_abs_avg=16.318710327148438
production_forward grad[74] vs paper_forward: mean_abs=0.33797311782836914, max_abs=1.25, mean_rel=0.09852129220962524, max_rel=6.327548027038574, norm_rel=0.022744670510292053, ref_abs_avg=14.956634521484375, test_abs_avg=14.946784973144531
production_forward grad[75] vs paper_forward: mean_abs=0.39279383420944214, max_abs=3.0, mean_rel=0.1412668526172638, max_rel=814.9581298828125, norm_rel=0.023059383034706116, ref_abs_avg=17.04755401611328, test_abs_avg=17.048799514770508
production_forward grad[76] vs paper_forward: mean_abs=0.3851088881492615, max_abs=3.0, mean_rel=0.15472710132598877, max_rel=585.2926635742188, norm_rel=0.02282523177564144, ref_abs_avg=16.884662628173828, test_abs_avg=16.890621185302734
production_forward grad[77] vs paper_forward: mean_abs=0.28306078910827637, max_abs=1.25, mean_rel=0.06812252849340439, max_rel=3.307666063308716, norm_rel=0.0209797490388155, ref_abs_avg=14.093037605285645, test_abs_avg=14.096731185913086
production_forward grad[78] vs paper_forward: mean_abs=0.36708804965019226, max_abs=3.0, mean_rel=0.14564356207847595, max_rel=1042.8455810546875, norm_rel=0.02273295447230339, ref_abs_avg=16.14815330505371, test_abs_avg=16.14986228942871
production_forward grad[79] vs paper_forward: mean_abs=0.3624515235424042, max_abs=3.59375, mean_rel=0.1465090662240982, max_rel=567.4790649414062, norm_rel=0.02296159788966179, ref_abs_avg=15.829172134399414, test_abs_avg=15.828916549682617
production_forward grad[80] vs paper_forward: mean_abs=0.2866213321685791, max_abs=1.375, mean_rel=0.12140738219022751, max_rel=11.318962097167969, norm_rel=0.022243738174438477, ref_abs_avg=12.771222114562988, test_abs_avg=12.755256652832031
production_forward grad[81] vs paper_forward: mean_abs=0.3321459889411926, max_abs=3.25, mean_rel=0.14561952650547028, max_rel=1448.9075927734375, norm_rel=0.021934831514954567, ref_abs_avg=15.163495063781738, test_abs_avg=15.164133071899414
production_forward grad[82] vs paper_forward: mean_abs=0.3323986530303955, max_abs=3.25, mean_rel=0.141336128115654, max_rel=589.2380981445312, norm_rel=0.021789290010929108, ref_abs_avg=15.335407257080078, test_abs_avg=15.330798149108887
production_forward grad[83] vs paper_forward: mean_abs=0.27272486686706543, max_abs=1.125, mean_rel=0.09035800397396088, max_rel=5.711326599121094, norm_rel=0.02148917317390442, ref_abs_avg=12.701225280761719, test_abs_avg=12.701152801513672
production_forward grad[84] vs paper_forward: mean_abs=0.31679338216781616, max_abs=2.75, mean_rel=0.14060868322849274, max_rel=833.4672241210938, norm_rel=0.021422289311885834, ref_abs_avg=14.79465389251709, test_abs_avg=14.796836853027344
production_forward grad[85] vs paper_forward: mean_abs=0.3076061010360718, max_abs=2.9375, mean_rel=0.13089519739151, max_rel=463.5381164550781, norm_rel=0.021221181377768517, ref_abs_avg=14.521806716918945, test_abs_avg=14.524124145507812
production_forward grad[86] vs paper_forward: mean_abs=0.24106256663799286, max_abs=0.90625, mean_rel=0.16943535208702087, max_rel=30.109642028808594, norm_rel=0.0204999428242445, ref_abs_avg=11.620211601257324, test_abs_avg=11.62858772277832
production_forward grad[87] vs paper_forward: mean_abs=0.29456692934036255, max_abs=2.78125, mean_rel=0.13107392191886902, max_rel=389.7644348144531, norm_rel=0.021077502518892288, ref_abs_avg=14.026880264282227, test_abs_avg=14.028173446655273
production_forward grad[88] vs paper_forward: mean_abs=0.2922065854072571, max_abs=2.75, mean_rel=0.13129468262195587, max_rel=447.095947265625, norm_rel=0.02112080156803131, ref_abs_avg=13.963897705078125, test_abs_avg=13.96546745300293
production_forward grad[89] vs paper_forward: mean_abs=0.2295283079147339, max_abs=0.9375, mean_rel=0.22137801349163055, max_rel=43.62296676635742, norm_rel=0.02004609815776348, ref_abs_avg=11.590911865234375, test_abs_avg=11.569375038146973
production_forward grad[90] vs paper_forward: mean_abs=0.2806592583656311, max_abs=2.8125, mean_rel=0.12546318769454956, max_rel=642.7575073242188, norm_rel=0.020406441763043404, ref_abs_avg=13.847569465637207, test_abs_avg=13.84857177734375
production_forward grad[91] vs paper_forward: mean_abs=0.2643401026725769, max_abs=2.5, mean_rel=0.1288522481918335, max_rel=630.1869506835938, norm_rel=0.020114995539188385, ref_abs_avg=13.234946250915527, test_abs_avg=13.237846374511719
production_forward grad[92] vs paper_forward: mean_abs=0.23470091819763184, max_abs=0.96875, mean_rel=0.055534008890390396, max_rel=2.611316680908203, norm_rel=0.021630821749567986, ref_abs_avg=11.109933853149414, test_abs_avg=11.11779499053955
production_forward grad[93] vs paper_forward: mean_abs=0.2624995708465576, max_abs=2.625, mean_rel=0.12264550477266312, max_rel=600.1494140625, norm_rel=0.020257361233234406, ref_abs_avg=13.098028182983398, test_abs_avg=13.099630355834961
production_forward grad[94] vs paper_forward: mean_abs=0.2541669011116028, max_abs=2.5, mean_rel=0.11795467883348465, max_rel=466.05462646484375, norm_rel=0.019134415313601494, ref_abs_avg=13.382716178894043, test_abs_avg=13.382637023925781
production_forward grad[95] vs paper_forward: mean_abs=0.2076563835144043, max_abs=0.8359375, mean_rel=0.08188333362340927, max_rel=4.8610405921936035, norm_rel=0.01910286396741867, ref_abs_avg=10.786245346069336, test_abs_avg=10.79124927520752
production_forward grad[96] vs paper_forward: mean_abs=0.2483115941286087, max_abs=3.25, mean_rel=0.12027646601200104, max_rel=598.028076171875, norm_rel=0.019383681938052177, ref_abs_avg=13.034200668334961, test_abs_avg=13.035172462463379
production_forward grad[97] vs paper_forward: mean_abs=0.23923081159591675, max_abs=2.49609375, mean_rel=0.11791486293077469, max_rel=641.3324584960938, norm_rel=0.01915431022644043, ref_abs_avg=12.713068962097168, test_abs_avg=12.713460922241211
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016830831300467253, max_abs=0.046875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008933992125093937, max_abs=0.466796875, mean_rel=0.07567892968654633, max_rel=95.3083267211914, norm_rel=0.020654266700148582, ref_abs_avg=0.4668831527233124, test_abs_avg=0.4668901264667511
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.3881659507751465, max_abs=38.75, mean_rel=0.17503441870212555, max_rel=160.52139282226562, norm_rel=0.021280106157064438, ref_abs_avg=228.2950897216797, test_abs_avg=228.27059936523438
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.9024477005004883, max_abs=3.546875, mean_rel=0.07380504906177521, max_rel=2.552661180496216, norm_rel=0.022335609421133995, ref_abs_avg=40.53485870361328, test_abs_avg=40.526878356933594
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.1349472999572754, max_abs=9.3125, mean_rel=0.154465913772583, max_rel=722.89599609375, norm_rel=0.023791123181581497, ref_abs_avg=47.95964431762695, test_abs_avg=47.96027374267578
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.1135914325714111, max_abs=7.5, mean_rel=0.1518055647611618, max_rel=844.0370483398438, norm_rel=0.02352713793516159, ref_abs_avg=47.58937072753906, test_abs_avg=47.58698654174805
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.7530899047851562, max_abs=3.0, mean_rel=0.12971927225589752, max_rel=13.192611694335938, norm_rel=0.022970564663410187, ref_abs_avg=32.12895965576172, test_abs_avg=32.19366455078125
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9863730669021606, max_abs=6.25, mean_rel=0.16610468924045563, max_rel=2076.92041015625, norm_rel=0.02346239611506462, ref_abs_avg=42.23849868774414, test_abs_avg=42.24163818359375
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9587962627410889, max_abs=6.0, mean_rel=0.1507631540298462, max_rel=933.3502807617188, norm_rel=0.02308283932507038, ref_abs_avg=41.77604675292969, test_abs_avg=41.775146484375
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7263479232788086, max_abs=3.25, mean_rel=0.0823388397693634, max_rel=5.102280616760254, norm_rel=0.02219819463789463, ref_abs_avg=33.07938766479492, test_abs_avg=33.03787612915039
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.9000142812728882, max_abs=6.3125, mean_rel=0.15443885326385498, max_rel=1362.1727294921875, norm_rel=0.023161737248301506, ref_abs_avg=38.99879837036133, test_abs_avg=39.00124740600586
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.878711462020874, max_abs=5.5, mean_rel=0.1724221259355545, max_rel=2476.392578125, norm_rel=0.022965330630540848, ref_abs_avg=38.46556091308594, test_abs_avg=38.46293640136719
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.6461982727050781, max_abs=2.5, mean_rel=0.08832345902919769, max_rel=6.094268798828125, norm_rel=0.022046461701393127, ref_abs_avg=29.76634407043457, test_abs_avg=29.80690574645996
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8243396282196045, max_abs=5.4375, mean_rel=0.16957169771194458, max_rel=2311.595947265625, norm_rel=0.023143064230680466, ref_abs_avg=35.787044525146484, test_abs_avg=35.78668212890625
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.8025591373443604, max_abs=5.0625, mean_rel=0.17024165391921997, max_rel=1930.0430908203125, norm_rel=0.02270304411649704, ref_abs_avg=35.511207580566406, test_abs_avg=35.5103759765625
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6195919513702393, max_abs=2.25, mean_rel=0.12090249359607697, max_rel=26.295228958129883, norm_rel=0.023337818682193756, ref_abs_avg=26.81689453125, test_abs_avg=26.84521484375
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7654775381088257, max_abs=5.5, mean_rel=0.15609776973724365, max_rel=748.7322387695312, norm_rel=0.022857660427689552, ref_abs_avg=33.61362075805664, test_abs_avg=33.615089416503906
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.7463066577911377, max_abs=4.5, mean_rel=0.15111705660820007, max_rel=945.08447265625, norm_rel=0.022575581446290016, ref_abs_avg=33.20844650268555, test_abs_avg=33.205810546875
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5614471435546875, max_abs=2.25, mean_rel=0.09065435826778412, max_rel=7.921998500823975, norm_rel=0.02229442074894905, ref_abs_avg=25.39047622680664, test_abs_avg=25.401569366455078
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.7184450030326843, max_abs=4.5, mean_rel=0.1500132828950882, max_rel=1411.3782958984375, norm_rel=0.02268018014729023, ref_abs_avg=31.80097770690918, test_abs_avg=31.80337142944336
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.7087475657463074, max_abs=4.5, mean_rel=0.15118807554244995, max_rel=1003.2724609375, norm_rel=0.022528598085045815, ref_abs_avg=31.59693145751953, test_abs_avg=31.59873390197754
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5559952855110168, max_abs=2.5, mean_rel=0.41091203689575195, max_rel=161.0507354736328, norm_rel=0.022595180198550224, ref_abs_avg=25.217308044433594, test_abs_avg=25.176334381103516
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6855210065841675, max_abs=4.03125, mean_rel=0.15152432024478912, max_rel=1076.42626953125, norm_rel=0.022577621042728424, ref_abs_avg=30.483421325683594, test_abs_avg=30.485454559326172
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6684049367904663, max_abs=4.0, mean_rel=0.155411034822464, max_rel=879.0859375, norm_rel=0.022219983860850334, ref_abs_avg=30.206911087036133, test_abs_avg=30.20391082763672
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.49912405014038086, max_abs=2.0, mean_rel=0.09095703810453415, max_rel=12.963022232055664, norm_rel=0.021211083978414536, ref_abs_avg=24.002460479736328, test_abs_avg=24.010530471801758
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6550332307815552, max_abs=4.25, mean_rel=0.14907902479171753, max_rel=1300.9857177734375, norm_rel=0.022399447858333588, ref_abs_avg=29.35013771057129, test_abs_avg=29.351112365722656
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6385344862937927, max_abs=4.0, mean_rel=0.14673981070518494, max_rel=1176.90380859375, norm_rel=0.022064276039600372, ref_abs_avg=29.056072235107422, test_abs_avg=29.057613372802734
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6520481109619141, max_abs=2.75, mean_rel=0.2649374008178711, max_rel=55.79194259643555, norm_rel=0.025413569062948227, ref_abs_avg=25.077348709106445, test_abs_avg=25.06882095336914
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7682765126228333, max_abs=6.5, mean_rel=0.16192835569381714, max_rel=1545.21630859375, norm_rel=0.024394787847995758, ref_abs_avg=31.624465942382812, test_abs_avg=31.625165939331055
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.753509521484375, max_abs=4.75, mean_rel=0.1642960160970688, max_rel=770.6778564453125, norm_rel=0.024366697296500206, ref_abs_avg=31.048450469970703, test_abs_avg=31.046335220336914
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.6165047883987427, max_abs=3.0, mean_rel=0.10836848616600037, max_rel=13.062833786010742, norm_rel=0.02629605494439602, ref_abs_avg=24.086172103881836, test_abs_avg=24.124927520751953
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.7119213342666626, max_abs=5.5, mean_rel=0.16386902332305908, max_rel=1246.336669921875, norm_rel=0.02464073896408081, ref_abs_avg=28.993606567382812, test_abs_avg=28.99597930908203
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6972610950469971, max_abs=4.625, mean_rel=0.1689269244670868, max_rel=667.1539916992188, norm_rel=0.024340666830539703, ref_abs_avg=28.75722885131836, test_abs_avg=28.758853912353516
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5560463666915894, max_abs=2.25, mean_rel=0.17108431458473206, max_rel=29.06965446472168, norm_rel=0.026957308873534203, ref_abs_avg=20.723339080810547, test_abs_avg=20.733509063720703
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.651463508605957, max_abs=4.0, mean_rel=0.16016140580177307, max_rel=1095.8538818359375, norm_rel=0.024547267705202103, ref_abs_avg=26.590370178222656, test_abs_avg=26.597126007080078
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6443221569061279, max_abs=4.4375, mean_rel=0.18330609798431396, max_rel=2416.02978515625, norm_rel=0.024425286799669266, ref_abs_avg=26.421085357666016, test_abs_avg=26.423717498779297
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.5089688897132874, max_abs=1.71875, mean_rel=0.17680424451828003, max_rel=12.736659049987793, norm_rel=0.023425016552209854, ref_abs_avg=21.205181121826172, test_abs_avg=21.21757698059082
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.6169067621231079, max_abs=4.25, mean_rel=0.1617640256881714, max_rel=1020.410400390625, norm_rel=0.024264665320515633, ref_abs_avg=25.492572784423828, test_abs_avg=25.493240356445312
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.6032044887542725, max_abs=4.625, mean_rel=0.15123507380485535, max_rel=574.54736328125, norm_rel=0.02417895384132862, ref_abs_avg=25.033405303955078, test_abs_avg=25.029708862304688
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.4778921604156494, max_abs=2.2265625, mean_rel=0.2003137767314911, max_rel=54.569034576416016, norm_rel=0.02466432936489582, ref_abs_avg=19.618146896362305, test_abs_avg=19.576934814453125
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5788931846618652, max_abs=4.0625, mean_rel=0.1626024842262268, max_rel=1239.154296875, norm_rel=0.02393723838031292, ref_abs_avg=24.238834381103516, test_abs_avg=24.241487503051758
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5617972612380981, max_abs=3.59375, mean_rel=0.1611979454755783, max_rel=709.8221435546875, norm_rel=0.02360430546104908, ref_abs_avg=23.84798240661621, test_abs_avg=23.847801208496094
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4444856643676758, max_abs=1.625, mean_rel=0.14346729218959808, max_rel=16.081192016601562, norm_rel=0.023866791278123856, ref_abs_avg=18.237451553344727, test_abs_avg=18.24117660522461
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5390323996543884, max_abs=4.0, mean_rel=0.15431112051010132, max_rel=915.5468139648438, norm_rel=0.02382628247141838, ref_abs_avg=22.6619930267334, test_abs_avg=22.66451072692871
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5366706848144531, max_abs=3.609375, mean_rel=0.143500417470932, max_rel=756.504638671875, norm_rel=0.023744450882077217, ref_abs_avg=22.652517318725586, test_abs_avg=22.656352996826172
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.42328667640686035, max_abs=1.625, mean_rel=0.5194249749183655, max_rel=207.39463806152344, norm_rel=0.02468698099255562, ref_abs_avg=17.121505737304688, test_abs_avg=17.18096923828125
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.516504168510437, max_abs=3.375, mean_rel=0.15287436544895172, max_rel=1352.5009765625, norm_rel=0.023456258699297905, ref_abs_avg=22.041044235229492, test_abs_avg=22.04369354248047
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.5077048540115356, max_abs=3.75, mean_rel=0.15620285272598267, max_rel=890.9397583007812, norm_rel=0.023280000314116478, ref_abs_avg=21.825464248657227, test_abs_avg=21.829418182373047
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.4103536605834961, max_abs=1.625, mean_rel=0.09494543820619583, max_rel=6.647200584411621, norm_rel=0.02341006137430668, ref_abs_avg=17.379783630371094, test_abs_avg=17.405839920043945
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.49189871549606323, max_abs=3.25, mean_rel=0.14809933304786682, max_rel=524.603271484375, norm_rel=0.023160012438893318, ref_abs_avg=21.261371612548828, test_abs_avg=21.263954162597656
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.4861346483230591, max_abs=3.25, mean_rel=0.15166926383972168, max_rel=1025.5128173828125, norm_rel=0.023140598088502884, ref_abs_avg=21.071456909179688, test_abs_avg=21.070926666259766
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.45673322677612305, max_abs=2.125, mean_rel=0.09466754645109177, max_rel=5.663954257965088, norm_rel=0.02585313655436039, ref_abs_avg=17.632322311401367, test_abs_avg=17.60503578186035
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5528044104576111, max_abs=4.0, mean_rel=0.16950225830078125, max_rel=1396.24609375, norm_rel=0.02480568364262581, ref_abs_avg=22.346303939819336, test_abs_avg=22.347938537597656
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.543957531452179, max_abs=3.625, mean_rel=0.16248445212841034, max_rel=779.819091796875, norm_rel=0.024720806628465652, ref_abs_avg=22.09911346435547, test_abs_avg=22.09640884399414
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.42490145564079285, max_abs=1.5, mean_rel=0.12584929168224335, max_rel=9.776341438293457, norm_rel=0.025244031101465225, ref_abs_avg=16.824047088623047, test_abs_avg=16.82782745361328
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.5059416890144348, max_abs=3.1875, mean_rel=0.1471269726753235, max_rel=772.3165283203125, norm_rel=0.024224234744906425, ref_abs_avg=20.887598037719727, test_abs_avg=20.8892822265625
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.4954225718975067, max_abs=3.1552734375, mean_rel=0.16053196787834167, max_rel=645.6483154296875, norm_rel=0.024225426837801933, ref_abs_avg=20.467557907104492, test_abs_avg=20.46664810180664
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.36620283126831055, max_abs=1.703125, mean_rel=0.1044025719165802, max_rel=14.165858268737793, norm_rel=0.02298853173851967, ref_abs_avg=16.2299747467041, test_abs_avg=16.20950698852539
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.46947726607322693, max_abs=3.5, mean_rel=0.1515163779258728, max_rel=650.8001708984375, norm_rel=0.023749811574816704, ref_abs_avg=19.78174591064453, test_abs_avg=19.783172607421875
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.4608324468135834, max_abs=3.0, mean_rel=0.15163734555244446, max_rel=468.3678894042969, norm_rel=0.02356962114572525, ref_abs_avg=19.574186325073242, test_abs_avg=19.57135772705078
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.35022783279418945, max_abs=1.25, mean_rel=0.0792674571275711, max_rel=3.6594395637512207, norm_rel=0.022298496216535568, ref_abs_avg=15.888612747192383, test_abs_avg=15.906246185302734
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.4392772912979126, max_abs=3.046875, mean_rel=0.14502716064453125, max_rel=749.238037109375, norm_rel=0.02337665483355522, ref_abs_avg=18.781192779541016, test_abs_avg=18.78217315673828
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.43394920229911804, max_abs=3.03125, mean_rel=0.1552884727716446, max_rel=683.0474243164062, norm_rel=0.02311268448829651, ref_abs_avg=18.771984100341797, test_abs_avg=18.77454376220703
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.33624696731567383, max_abs=1.75, mean_rel=0.0753818154335022, max_rel=3.3536694049835205, norm_rel=0.023411398753523827, ref_abs_avg=14.719035148620605, test_abs_avg=14.727947235107422
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.4192782938480377, max_abs=3.125, mean_rel=0.1425483524799347, max_rel=695.9306030273438, norm_rel=0.02279231697320938, ref_abs_avg=18.357707977294922, test_abs_avg=18.36013412475586
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.40795475244522095, max_abs=2.96875, mean_rel=0.13676561415195465, max_rel=509.5680236816406, norm_rel=0.022738611325621605, ref_abs_avg=17.94100570678711, test_abs_avg=17.939090728759766
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.33435606956481934, max_abs=1.625, mean_rel=0.37298688292503357, max_rel=81.98477172851562, norm_rel=0.02332417480647564, ref_abs_avg=14.823203086853027, test_abs_avg=14.83755874633789
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.39862728118896484, max_abs=3.0, mean_rel=0.1415485143661499, max_rel=841.4492797851562, norm_rel=0.022389963269233704, ref_abs_avg=17.759254455566406, test_abs_avg=17.760332107543945
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.38647574186325073, max_abs=2.75, mean_rel=0.14404426515102386, max_rel=555.781005859375, norm_rel=0.022210679948329926, ref_abs_avg=17.407642364501953, test_abs_avg=17.41009521484375
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.3140011131763458, max_abs=1.25, mean_rel=0.08754943311214447, max_rel=4.997110366821289, norm_rel=0.022016305476427078, ref_abs_avg=14.07787036895752, test_abs_avg=14.09285831451416
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.3781552314758301, max_abs=2.75, mean_rel=0.1425008475780487, max_rel=413.4039611816406, norm_rel=0.022010646760463715, ref_abs_avg=17.164657592773438, test_abs_avg=17.16558074951172
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.37043920159339905, max_abs=2.640625, mean_rel=0.13581405580043793, max_rel=472.1036071777344, norm_rel=0.022242948412895203, ref_abs_avg=16.679357528686523, test_abs_avg=16.68247413635254
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.30027711391448975, max_abs=1.3125, mean_rel=0.15504440665245056, max_rel=43.85551071166992, norm_rel=0.02169005014002323, ref_abs_avg=13.896018981933594, test_abs_avg=13.898162841796875
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.3586166203022003, max_abs=2.75, mean_rel=0.14185205101966858, max_rel=1058.9483642578125, norm_rel=0.021810872480273247, ref_abs_avg=16.41591453552246, test_abs_avg=16.416812896728516
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.3539218008518219, max_abs=2.875, mean_rel=0.13909518718719482, max_rel=492.506591796875, norm_rel=0.021621672436594963, ref_abs_avg=16.320755004882812, test_abs_avg=16.31800079345703
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.3502826690673828, max_abs=1.1875, mean_rel=0.10060926526784897, max_rel=3.2461471557617188, norm_rel=0.023341599851846695, ref_abs_avg=14.956634521484375, test_abs_avg=14.9415283203125
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.3974938988685608, max_abs=3.375, mean_rel=0.14369280636310577, max_rel=996.0675048828125, norm_rel=0.023314135149121284, ref_abs_avg=17.04755401611328, test_abs_avg=17.048141479492188
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.39083027839660645, max_abs=3.5, mean_rel=0.155781552195549, max_rel=500.13671875, norm_rel=0.02315816469490528, ref_abs_avg=16.884662628173828, test_abs_avg=16.890335083007812
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.2859128713607788, max_abs=1.03125, mean_rel=0.06500796973705292, max_rel=6.370713710784912, norm_rel=0.021337905898690224, ref_abs_avg=14.093037605285645, test_abs_avg=14.104839324951172
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.3711511194705963, max_abs=3.0, mean_rel=0.14939840137958527, max_rel=1294.7830810546875, norm_rel=0.022970128804445267, ref_abs_avg=16.14815330505371, test_abs_avg=16.149009704589844
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.3664599061012268, max_abs=3.375, mean_rel=0.15164363384246826, max_rel=564.8744506835938, norm_rel=0.023204095661640167, ref_abs_avg=15.829172134399414, test_abs_avg=15.828495025634766
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.29344844818115234, max_abs=1.125, mean_rel=0.12443240731954575, max_rel=14.660594940185547, norm_rel=0.022904735058546066, ref_abs_avg=12.771222114562988, test_abs_avg=12.7510986328125
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.33563655614852905, max_abs=3.5, mean_rel=0.14892162382602692, max_rel=881.6031494140625, norm_rel=0.022145522758364677, ref_abs_avg=15.163495063781738, test_abs_avg=15.163644790649414
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.33675524592399597, max_abs=3.0, mean_rel=0.14572514593601227, max_rel=511.51739501953125, norm_rel=0.022040007635951042, ref_abs_avg=15.335407257080078, test_abs_avg=15.332664489746094
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.2703068256378174, max_abs=1.0, mean_rel=0.1291532814502716, max_rel=13.968315124511719, norm_rel=0.021261587738990784, ref_abs_avg=12.701225280761719, test_abs_avg=12.696128845214844
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.3191906213760376, max_abs=2.75, mean_rel=0.14008979499340057, max_rel=847.535888671875, norm_rel=0.021570172160863876, ref_abs_avg=14.79465389251709, test_abs_avg=14.796422958374023
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.3107456564903259, max_abs=2.5, mean_rel=0.13068872690200806, max_rel=381.6475830078125, norm_rel=0.02141374535858631, ref_abs_avg=14.521806716918945, test_abs_avg=14.525659561157227
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.2484799027442932, max_abs=0.921875, mean_rel=0.22587692737579346, max_rel=63.94944763183594, norm_rel=0.021227270364761353, ref_abs_avg=11.620211601257324, test_abs_avg=11.630093574523926
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.2964794635772705, max_abs=2.5, mean_rel=0.1316027194261551, max_rel=469.0956115722656, norm_rel=0.021208804100751877, ref_abs_avg=14.026880264282227, test_abs_avg=14.027027130126953
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.29368340969085693, max_abs=2.75, mean_rel=0.1301005631685257, max_rel=324.8979187011719, norm_rel=0.021186774596571922, ref_abs_avg=13.963897705078125, test_abs_avg=13.963445663452148
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.23388659954071045, max_abs=0.96875, mean_rel=0.15953883528709412, max_rel=20.492599487304688, norm_rel=0.020093221217393875, ref_abs_avg=11.590911865234375, test_abs_avg=11.571223258972168
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2817486822605133, max_abs=2.625, mean_rel=0.12591920793056488, max_rel=513.3219604492188, norm_rel=0.020466769114136696, ref_abs_avg=13.847569465637207, test_abs_avg=13.848244667053223
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.2684870660305023, max_abs=2.1875, mean_rel=0.12828350067138672, max_rel=446.8669128417969, norm_rel=0.020452477037906647, ref_abs_avg=13.234946250915527, test_abs_avg=13.2365083694458
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.234222412109375, max_abs=0.84375, mean_rel=0.06214768439531326, max_rel=3.327563524246216, norm_rel=0.021380817517638206, ref_abs_avg=11.109933853149414, test_abs_avg=11.118904113769531
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2631677985191345, max_abs=2.75, mean_rel=0.12275733053684235, max_rel=452.7768249511719, norm_rel=0.020300593227148056, ref_abs_avg=13.098028182983398, test_abs_avg=13.09892463684082
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.25479450821876526, max_abs=2.5625, mean_rel=0.12023063004016876, max_rel=397.8559265136719, norm_rel=0.019137859344482422, ref_abs_avg=13.382716178894043, test_abs_avg=13.38156509399414
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.20173394680023193, max_abs=0.9375, mean_rel=0.08124072104692459, max_rel=6.9690375328063965, norm_rel=0.018942121416330338, ref_abs_avg=10.786245346069336, test_abs_avg=10.79393196105957
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.2485748529434204, max_abs=2.75, mean_rel=0.11878491938114166, max_rel=620.4367065429688, norm_rel=0.01940579153597355, ref_abs_avg=13.034200668334961, test_abs_avg=13.035255432128906
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.23697365820407867, max_abs=2.5, mean_rel=0.11462350189685822, max_rel=521.6539306640625, norm_rel=0.01901157572865486, ref_abs_avg=12.713068962097168, test_abs_avg=12.711920738220215
liger_forward vs paper_forward output: mean_abs=0.0001537073840154335, max_abs=0.03125
liger_forward grad[0] vs paper_forward: mean_abs=0.0035820454359054565, max_abs=0.2177734375, mean_rel=0.02560785785317421, max_rel=48.25786209106445, norm_rel=0.009600227698683739, ref_abs_avg=0.4668831527233124, test_abs_avg=0.4668593406677246
liger_forward grad[1] vs paper_forward: mean_abs=1.6157031059265137, max_abs=16.0, mean_rel=0.05373819172382355, max_rel=91.36267852783203, norm_rel=0.006790286861360073, ref_abs_avg=228.2950897216797, test_abs_avg=228.28172302246094
liger_forward grad[2] vs paper_forward: mean_abs=0.341888427734375, max_abs=1.376953125, mean_rel=0.02733352780342102, max_rel=0.8308662176132202, norm_rel=0.008602673187851906, ref_abs_avg=40.53485870361328, test_abs_avg=40.525840759277344
liger_forward grad[3] vs paper_forward: mean_abs=0.40475666522979736, max_abs=3.0, mean_rel=0.05824398249387741, max_rel=409.3996276855469, norm_rel=0.008764372207224369, ref_abs_avg=47.95964431762695, test_abs_avg=47.957984924316406
liger_forward grad[4] vs paper_forward: mean_abs=0.38851186633110046, max_abs=2.875, mean_rel=0.055709924548864365, max_rel=364.7967834472656, norm_rel=0.008499749936163425, ref_abs_avg=47.58937072753906, test_abs_avg=47.58601379394531
liger_forward grad[5] vs paper_forward: mean_abs=0.28419017791748047, max_abs=1.0625, mean_rel=0.05106429010629654, max_rel=5.277044773101807, norm_rel=0.008899721316993237, ref_abs_avg=32.12895965576172, test_abs_avg=32.117733001708984
liger_forward grad[6] vs paper_forward: mean_abs=0.34531188011169434, max_abs=2.25, mean_rel=0.05948325991630554, max_rel=521.7516479492188, norm_rel=0.00850446056574583, ref_abs_avg=42.23849868774414, test_abs_avg=42.23883056640625
liger_forward grad[7] vs paper_forward: mean_abs=0.333096444606781, max_abs=2.0, mean_rel=0.0525803342461586, max_rel=280.2000732421875, norm_rel=0.00831631664186716, ref_abs_avg=41.77604675292969, test_abs_avg=41.775394439697266
liger_forward grad[8] vs paper_forward: mean_abs=0.25668585300445557, max_abs=1.0, mean_rel=0.023945627734065056, max_rel=1.0225119590759277, norm_rel=0.008025366812944412, ref_abs_avg=33.07938766479492, test_abs_avg=33.07776641845703
liger_forward grad[9] vs paper_forward: mean_abs=0.31288617849349976, max_abs=2.125, mean_rel=0.055747248232364655, max_rel=727.6487426757812, norm_rel=0.008348952978849411, ref_abs_avg=38.99879837036133, test_abs_avg=38.99888229370117
liger_forward grad[10] vs paper_forward: mean_abs=0.29939815402030945, max_abs=2.0, mean_rel=0.05271955206990242, max_rel=240.42535400390625, norm_rel=0.008139394223690033, ref_abs_avg=38.46556091308594, test_abs_avg=38.46555709838867
liger_forward grad[11] vs paper_forward: mean_abs=0.2275104522705078, max_abs=0.9765625, mean_rel=0.02903931587934494, max_rel=1.1345925331115723, norm_rel=0.008142875507473946, ref_abs_avg=29.76634407043457, test_abs_avg=29.75985336303711
liger_forward grad[12] vs paper_forward: mean_abs=0.28395095467567444, max_abs=2.0, mean_rel=0.05837160348892212, max_rel=883.8213500976562, norm_rel=0.008273442275822163, ref_abs_avg=35.787044525146484, test_abs_avg=35.787925720214844
liger_forward grad[13] vs paper_forward: mean_abs=0.27307450771331787, max_abs=1.625, mean_rel=0.05858473479747772, max_rel=615.1561279296875, norm_rel=0.008029406890273094, ref_abs_avg=35.511207580566406, test_abs_avg=35.509281158447266
liger_forward grad[14] vs paper_forward: mean_abs=0.20657086372375488, max_abs=1.0, mean_rel=0.06938894093036652, max_rel=23.378358840942383, norm_rel=0.008352038450539112, ref_abs_avg=26.81689453125, test_abs_avg=26.81563949584961
liger_forward grad[15] vs paper_forward: mean_abs=0.2603852152824402, max_abs=1.75, mean_rel=0.05139236897230148, max_rel=417.3638610839844, norm_rel=0.008081327192485332, ref_abs_avg=33.61362075805664, test_abs_avg=33.613433837890625
liger_forward grad[16] vs paper_forward: mean_abs=0.25130221247673035, max_abs=1.75, mean_rel=0.054251570254564285, max_rel=414.88214111328125, norm_rel=0.00791983213275671, ref_abs_avg=33.20844650268555, test_abs_avg=33.20819854736328
liger_forward grad[17] vs paper_forward: mean_abs=0.20587682723999023, max_abs=0.75, mean_rel=0.03681621700525284, max_rel=3.9609992504119873, norm_rel=0.008239824324846268, ref_abs_avg=25.39047622680664, test_abs_avg=25.371477127075195
liger_forward grad[18] vs paper_forward: mean_abs=0.24138687551021576, max_abs=1.640625, mean_rel=0.050422824919223785, max_rel=445.6968078613281, norm_rel=0.007936344482004642, ref_abs_avg=31.80097770690918, test_abs_avg=31.800701141357422
liger_forward grad[19] vs paper_forward: mean_abs=0.2332962155342102, max_abs=1.5, mean_rel=0.04930747672915459, max_rel=312.92926025390625, norm_rel=0.007737525273114443, ref_abs_avg=31.59693145751953, test_abs_avg=31.59752655029297
liger_forward grad[20] vs paper_forward: mean_abs=0.17797774076461792, max_abs=0.75, mean_rel=0.4826086759567261, max_rel=230.70375061035156, norm_rel=0.007462594658136368, ref_abs_avg=25.217308044433594, test_abs_avg=25.22536277770996
liger_forward grad[21] vs paper_forward: mean_abs=0.22835500538349152, max_abs=1.5, mean_rel=0.050223927944898605, max_rel=356.7945556640625, norm_rel=0.007843687199056149, ref_abs_avg=30.483421325683594, test_abs_avg=30.483776092529297
liger_forward grad[22] vs paper_forward: mean_abs=0.21883314847946167, max_abs=1.5, mean_rel=0.05050210654735565, max_rel=357.5163879394531, norm_rel=0.007611861452460289, ref_abs_avg=30.206911087036133, test_abs_avg=30.20418930053711
liger_forward grad[23] vs paper_forward: mean_abs=0.17243242263793945, max_abs=0.53125, mean_rel=0.0393330454826355, max_rel=6.709423542022705, norm_rel=0.007362190168350935, ref_abs_avg=24.002460479736328, test_abs_avg=24.01012420654297
liger_forward grad[24] vs paper_forward: mean_abs=0.215985968708992, max_abs=1.5, mean_rel=0.049151185899972916, max_rel=499.42626953125, norm_rel=0.007706847973167896, ref_abs_avg=29.35013771057129, test_abs_avg=29.349946975708008
liger_forward grad[25] vs paper_forward: mean_abs=0.20671486854553223, max_abs=1.5, mean_rel=0.04876547306776047, max_rel=311.2054748535156, norm_rel=0.0074789756909012794, ref_abs_avg=29.056072235107422, test_abs_avg=29.055763244628906
liger_forward grad[26] vs paper_forward: mean_abs=0.19379544258117676, max_abs=0.75, mean_rel=0.03555339202284813, max_rel=4.4182281494140625, norm_rel=0.008014755323529243, ref_abs_avg=25.077348709106445, test_abs_avg=25.080516815185547
liger_forward grad[27] vs paper_forward: mean_abs=0.23980997502803802, max_abs=1.578125, mean_rel=0.052134882658720016, max_rel=375.9781494140625, norm_rel=0.00793482642620802, ref_abs_avg=31.624465942382812, test_abs_avg=31.624610900878906
liger_forward grad[28] vs paper_forward: mean_abs=0.2282094806432724, max_abs=1.5, mean_rel=0.049980901181697845, max_rel=236.64512634277344, norm_rel=0.007716281805187464, ref_abs_avg=31.048450469970703, test_abs_avg=31.048614501953125
liger_forward grad[29] vs paper_forward: mean_abs=0.17685365676879883, max_abs=0.625, mean_rel=0.04400571808218956, max_rel=8.675125122070312, norm_rel=0.007700924761593342, ref_abs_avg=24.086172103881836, test_abs_avg=24.096904754638672
liger_forward grad[30] vs paper_forward: mean_abs=0.2129497081041336, max_abs=1.625, mean_rel=0.05083638057112694, max_rel=522.1009521484375, norm_rel=0.007706116419285536, ref_abs_avg=28.993606567382812, test_abs_avg=28.9930419921875
liger_forward grad[31] vs paper_forward: mean_abs=0.2055400311946869, max_abs=1.5, mean_rel=0.04819025844335556, max_rel=242.3654022216797, norm_rel=0.007516408339142799, ref_abs_avg=28.75722885131836, test_abs_avg=28.757225036621094
liger_forward grad[32] vs paper_forward: mean_abs=0.1756279468536377, max_abs=0.65625, mean_rel=0.0490894615650177, max_rel=8.186088562011719, norm_rel=0.008473309688270092, ref_abs_avg=20.723339080810547, test_abs_avg=20.711589813232422
liger_forward grad[33] vs paper_forward: mean_abs=0.1912367045879364, max_abs=1.25, mean_rel=0.045159026980400085, max_rel=297.12225341796875, norm_rel=0.007549950387328863, ref_abs_avg=26.590370178222656, test_abs_avg=26.590778350830078
liger_forward grad[34] vs paper_forward: mean_abs=0.18507865071296692, max_abs=1.25, mean_rel=0.05014470964670181, max_rel=278.0049133300781, norm_rel=0.007379327435046434, ref_abs_avg=26.421085357666016, test_abs_avg=26.42202377319336
liger_forward grad[35] vs paper_forward: mean_abs=0.14516448974609375, max_abs=0.544921875, mean_rel=0.05712571367621422, max_rel=5.655895709991455, norm_rel=0.007189163006842136, ref_abs_avg=21.205181121826172, test_abs_avg=21.208820343017578
liger_forward grad[36] vs paper_forward: mean_abs=0.18017330765724182, max_abs=1.5, mean_rel=0.04707729071378708, max_rel=238.17385864257812, norm_rel=0.007438632193952799, ref_abs_avg=25.492572784423828, test_abs_avg=25.492834091186523
liger_forward grad[37] vs paper_forward: mean_abs=0.1718735694885254, max_abs=1.25, mean_rel=0.043689314275979996, max_rel=161.2659912109375, norm_rel=0.007236718665808439, ref_abs_avg=25.033405303955078, test_abs_avg=25.0325870513916
liger_forward grad[38] vs paper_forward: mean_abs=0.13148510456085205, max_abs=0.53125, mean_rel=0.02554347552359104, max_rel=2.1510872840881348, norm_rel=0.007168853189796209, ref_abs_avg=19.618146896362305, test_abs_avg=19.613235473632812
liger_forward grad[39] vs paper_forward: mean_abs=0.16768097877502441, max_abs=1.5, mean_rel=0.04674924910068512, max_rel=295.2970275878906, norm_rel=0.007284159772098064, ref_abs_avg=24.238834381103516, test_abs_avg=24.23948097229004
liger_forward grad[40] vs paper_forward: mean_abs=0.1606016457080841, max_abs=1.0, mean_rel=0.04511905461549759, max_rel=265.5496826171875, norm_rel=0.007111165206879377, ref_abs_avg=23.84798240661621, test_abs_avg=23.8482666015625
liger_forward grad[41] vs paper_forward: mean_abs=0.12336349487304688, max_abs=0.5, mean_rel=0.03636925667524338, max_rel=4.932681560516357, norm_rel=0.007083892822265625, ref_abs_avg=18.237451553344727, test_abs_avg=18.230642318725586
liger_forward grad[42] vs paper_forward: mean_abs=0.15505450963974, max_abs=1.0, mean_rel=0.045502714812755585, max_rel=260.8267517089844, norm_rel=0.007213612552732229, ref_abs_avg=22.6619930267334, test_abs_avg=22.662151336669922
liger_forward grad[43] vs paper_forward: mean_abs=0.15072724223136902, max_abs=1.0, mean_rel=0.04113391041755676, max_rel=208.2535858154297, norm_rel=0.0070401206612586975, ref_abs_avg=22.652517318725586, test_abs_avg=22.6518497467041
liger_forward grad[44] vs paper_forward: mean_abs=0.11675238609313965, max_abs=0.625, mean_rel=0.036632832139730453, max_rel=3.019026756286621, norm_rel=0.007368102204054594, ref_abs_avg=17.121505737304688, test_abs_avg=17.1248779296875
liger_forward grad[45] vs paper_forward: mean_abs=0.14695854485034943, max_abs=1.125, mean_rel=0.0429188534617424, max_rel=157.32931518554688, norm_rel=0.007037454284727573, ref_abs_avg=22.041044235229492, test_abs_avg=22.040861129760742
liger_forward grad[46] vs paper_forward: mean_abs=0.14154207706451416, max_abs=1.0, mean_rel=0.04238220304250717, max_rel=205.42465209960938, norm_rel=0.006867951713502407, ref_abs_avg=21.825464248657227, test_abs_avg=21.826398849487305
liger_forward grad[47] vs paper_forward: mean_abs=0.11402606964111328, max_abs=0.5, mean_rel=0.025398459285497665, max_rel=1.9347552061080933, norm_rel=0.006911322940140963, ref_abs_avg=17.379783630371094, test_abs_avg=17.37554168701172
liger_forward grad[48] vs paper_forward: mean_abs=0.14109882712364197, max_abs=1.0, mean_rel=0.042544059455394745, max_rel=230.79345703125, norm_rel=0.007005617022514343, ref_abs_avg=21.261371612548828, test_abs_avg=21.26114273071289
liger_forward grad[49] vs paper_forward: mean_abs=0.13527216017246246, max_abs=1.0, mean_rel=0.04283079504966736, max_rel=147.7682342529297, norm_rel=0.006809521000832319, ref_abs_avg=21.071456909179688, test_abs_avg=21.07115936279297
liger_forward grad[50] vs paper_forward: mean_abs=0.13575267791748047, max_abs=0.5, mean_rel=0.029308747500181198, max_rel=1.283059000968933, norm_rel=0.00793605949729681, ref_abs_avg=17.632322311401367, test_abs_avg=17.626487731933594
liger_forward grad[51] vs paper_forward: mean_abs=0.1594831943511963, max_abs=1.25, mean_rel=0.047283321619033813, max_rel=237.8367462158203, norm_rel=0.007490437477827072, ref_abs_avg=22.346303939819336, test_abs_avg=22.346303939819336
liger_forward grad[52] vs paper_forward: mean_abs=0.15421554446220398, max_abs=1.25, mean_rel=0.04595557600259781, max_rel=389.88323974609375, norm_rel=0.007350474130362272, ref_abs_avg=22.09911346435547, test_abs_avg=22.099790573120117
liger_forward grad[53] vs paper_forward: mean_abs=0.12091267108917236, max_abs=0.5, mean_rel=0.03435979038476944, max_rel=3.26550555229187, norm_rel=0.007594344671815634, ref_abs_avg=16.824047088623047, test_abs_avg=16.822465896606445
liger_forward grad[54] vs paper_forward: mean_abs=0.14434200525283813, max_abs=1.0, mean_rel=0.043702445924282074, max_rel=229.62315368652344, norm_rel=0.0072646113112568855, ref_abs_avg=20.887598037719727, test_abs_avg=20.88752555847168
liger_forward grad[55] vs paper_forward: mean_abs=0.1396143138408661, max_abs=1.0, mean_rel=0.045128148049116135, max_rel=127.85839080810547, norm_rel=0.007186347618699074, ref_abs_avg=20.467557907104492, test_abs_avg=20.46848487854004
liger_forward grad[56] vs paper_forward: mean_abs=0.10913324356079102, max_abs=0.5625, mean_rel=0.023567501455545425, max_rel=1.4659565687179565, norm_rel=0.007261061575263739, ref_abs_avg=16.2299747467041, test_abs_avg=16.232532501220703
liger_forward grad[57] vs paper_forward: mean_abs=0.13404111564159393, max_abs=1.0, mean_rel=0.043471239507198334, max_rel=328.8612976074219, norm_rel=0.0071311709471046925, ref_abs_avg=19.78174591064453, test_abs_avg=19.781307220458984
liger_forward grad[58] vs paper_forward: mean_abs=0.129856139421463, max_abs=1.0, mean_rel=0.045124318450689316, max_rel=154.77549743652344, norm_rel=0.007019030395895243, ref_abs_avg=19.574186325073242, test_abs_avg=19.573612213134766
liger_forward grad[59] vs paper_forward: mean_abs=0.10157394409179688, max_abs=0.40625, mean_rel=0.028039155527949333, max_rel=1.694685459136963, norm_rel=0.006944332737475634, ref_abs_avg=15.888612747192383, test_abs_avg=15.892053604125977
liger_forward grad[60] vs paper_forward: mean_abs=0.12436003983020782, max_abs=1.0, mean_rel=0.040353186428546906, max_rel=141.00408935546875, norm_rel=0.006982860621064901, ref_abs_avg=18.781192779541016, test_abs_avg=18.781187057495117
liger_forward grad[61] vs paper_forward: mean_abs=0.1212085485458374, max_abs=1.0, mean_rel=0.04313213378190994, max_rel=218.45960998535156, norm_rel=0.006820910610258579, ref_abs_avg=18.771984100341797, test_abs_avg=18.771160125732422
liger_forward grad[62] vs paper_forward: mean_abs=0.10200810432434082, max_abs=0.40625, mean_rel=0.02681276947259903, max_rel=4.127302646636963, norm_rel=0.007333647925406694, ref_abs_avg=14.719035148620605, test_abs_avg=14.710681915283203
liger_forward grad[63] vs paper_forward: mean_abs=0.117926225066185, max_abs=1.0, mean_rel=0.040679484605789185, max_rel=207.71592712402344, norm_rel=0.006788024678826332, ref_abs_avg=18.357707977294922, test_abs_avg=18.357450485229492
liger_forward grad[64] vs paper_forward: mean_abs=0.11307097971439362, max_abs=1.0, mean_rel=0.038765378296375275, max_rel=229.1246795654297, norm_rel=0.0066797626204788685, ref_abs_avg=17.94100570678711, test_abs_avg=17.938987731933594
liger_forward grad[65] vs paper_forward: mean_abs=0.09514117240905762, max_abs=0.5, mean_rel=0.05192507803440094, max_rel=13.691671371459961, norm_rel=0.006885000970214605, ref_abs_avg=14.823203086853027, test_abs_avg=14.825483322143555
liger_forward grad[66] vs paper_forward: mean_abs=0.11152810603380203, max_abs=1.0, mean_rel=0.04097648337483406, max_rel=172.4135284423828, norm_rel=0.006643973290920258, ref_abs_avg=17.759254455566406, test_abs_avg=17.758920669555664
liger_forward grad[67] vs paper_forward: mean_abs=0.10613644123077393, max_abs=1.0, mean_rel=0.03848261758685112, max_rel=153.90869140625, norm_rel=0.0064890384674072266, ref_abs_avg=17.407642364501953, test_abs_avg=17.407129287719727
liger_forward grad[68] vs paper_forward: mean_abs=0.08403021097183228, max_abs=0.375, mean_rel=0.026563700288534164, max_rel=1.866390585899353, norm_rel=0.0065236771479249, ref_abs_avg=14.07787036895752, test_abs_avg=14.070605278015137
liger_forward grad[69] vs paper_forward: mean_abs=0.10568489134311676, max_abs=0.84375, mean_rel=0.04160522669553757, max_rel=236.0612335205078, norm_rel=0.006533605977892876, ref_abs_avg=17.164657592773438, test_abs_avg=17.16469955444336
liger_forward grad[70] vs paper_forward: mean_abs=0.1007637232542038, max_abs=1.0, mean_rel=0.038663700222969055, max_rel=270.2802429199219, norm_rel=0.0064480979926884174, ref_abs_avg=16.679357528686523, test_abs_avg=16.678255081176758
liger_forward grad[71] vs paper_forward: mean_abs=0.08375132083892822, max_abs=0.3125, mean_rel=0.04568096250295639, max_rel=13.477724075317383, norm_rel=0.006323350127786398, ref_abs_avg=13.896018981933594, test_abs_avg=13.899450302124023
liger_forward grad[72] vs paper_forward: mean_abs=0.10075646638870239, max_abs=1.0, mean_rel=0.04014594852924347, max_rel=272.2348327636719, norm_rel=0.00651595601812005, ref_abs_avg=16.41591453552246, test_abs_avg=16.415882110595703
liger_forward grad[73] vs paper_forward: mean_abs=0.09767106175422668, max_abs=1.0, mean_rel=0.03972405940294266, max_rel=158.89129638671875, norm_rel=0.006376263219863176, ref_abs_avg=16.320755004882812, test_abs_avg=16.32122039794922
liger_forward grad[74] vs paper_forward: mean_abs=0.09544134140014648, max_abs=0.4375, mean_rel=0.031681835651397705, max_rel=2.407322883605957, norm_rel=0.006831023376435041, ref_abs_avg=14.956634521484375, test_abs_avg=14.960081100463867
liger_forward grad[75] vs paper_forward: mean_abs=0.11661141365766525, max_abs=1.0, mean_rel=0.0448923222720623, max_rel=458.9810485839844, norm_rel=0.007199379615485668, ref_abs_avg=17.04755401611328, test_abs_avg=17.047767639160156
liger_forward grad[76] vs paper_forward: mean_abs=0.11411348730325699, max_abs=1.0, mean_rel=0.04513661190867424, max_rel=310.16839599609375, norm_rel=0.007148636505007744, ref_abs_avg=16.884662628173828, test_abs_avg=16.88579559326172
liger_forward grad[77] vs paper_forward: mean_abs=0.09304344654083252, max_abs=0.375, mean_rel=0.028643682599067688, max_rel=5.345294952392578, norm_rel=0.007174271158874035, ref_abs_avg=14.093037605285645, test_abs_avg=14.093219757080078
liger_forward grad[78] vs paper_forward: mean_abs=0.10774147510528564, max_abs=1.0, mean_rel=0.043735578656196594, max_rel=240.547607421875, norm_rel=0.007022105623036623, ref_abs_avg=16.14815330505371, test_abs_avg=16.14817237854004
liger_forward grad[79] vs paper_forward: mean_abs=0.10540495812892914, max_abs=1.125, mean_rel=0.04220190271735191, max_rel=238.6781768798828, norm_rel=0.007041973061859608, ref_abs_avg=15.829172134399414, test_abs_avg=15.828457832336426
liger_forward grad[80] vs paper_forward: mean_abs=0.07820487022399902, max_abs=0.375, mean_rel=0.03459414839744568, max_rel=3.6255064010620117, norm_rel=0.006672814022749662, ref_abs_avg=12.771222114562988, test_abs_avg=12.774651527404785
liger_forward grad[81] vs paper_forward: mean_abs=0.09794938564300537, max_abs=1.0, mean_rel=0.04180208966135979, max_rel=214.48846435546875, norm_rel=0.00682916259393096, ref_abs_avg=15.163495063781738, test_abs_avg=15.163708686828613
liger_forward grad[82] vs paper_forward: mean_abs=0.09539592266082764, max_abs=1.0, mean_rel=0.04105057567358017, max_rel=204.5991973876953, norm_rel=0.006611695978790522, ref_abs_avg=15.335407257080078, test_abs_avg=15.334911346435547
liger_forward grad[83] vs paper_forward: mean_abs=0.07402908802032471, max_abs=0.34375, mean_rel=0.030250679701566696, max_rel=3.221710205078125, norm_rel=0.006267915014177561, ref_abs_avg=12.701225280761719, test_abs_avg=12.696784973144531
liger_forward grad[84] vs paper_forward: mean_abs=0.09232214093208313, max_abs=1.0, mean_rel=0.038833845406770706, max_rel=233.1905517578125, norm_rel=0.0066255745477974415, ref_abs_avg=14.79465389251709, test_abs_avg=14.794739723205566
liger_forward grad[85] vs paper_forward: mean_abs=0.08880312740802765, max_abs=0.78125, mean_rel=0.03797914460301399, max_rel=120.08377838134766, norm_rel=0.006525853183120489, ref_abs_avg=14.521806716918945, test_abs_avg=14.520950317382812
liger_forward grad[86] vs paper_forward: mean_abs=0.07324515283107758, max_abs=0.3125, mean_rel=0.046659182757139206, max_rel=7.465591907501221, norm_rel=0.0065581900998950005, ref_abs_avg=11.620211601257324, test_abs_avg=11.620855331420898
liger_forward grad[87] vs paper_forward: mean_abs=0.08511464297771454, max_abs=1.0, mean_rel=0.03857790306210518, max_rel=239.1435089111328, norm_rel=0.006474148482084274, ref_abs_avg=14.026880264282227, test_abs_avg=14.027112007141113
liger_forward grad[88] vs paper_forward: mean_abs=0.08307468891143799, max_abs=1.0, mean_rel=0.036926381289958954, max_rel=103.35694122314453, norm_rel=0.006394310388714075, ref_abs_avg=13.963897705078125, test_abs_avg=13.966218948364258
liger_forward grad[89] vs paper_forward: mean_abs=0.06925904750823975, max_abs=0.25, mean_rel=0.047357454895973206, max_rel=7.503344535827637, norm_rel=0.006233702879399061, ref_abs_avg=11.590911865234375, test_abs_avg=11.5889892578125
liger_forward grad[90] vs paper_forward: mean_abs=0.08043089509010315, max_abs=1.0, mean_rel=0.036589935421943665, max_rel=139.86268615722656, norm_rel=0.0062495749443769455, ref_abs_avg=13.847569465637207, test_abs_avg=13.847800254821777
liger_forward grad[91] vs paper_forward: mean_abs=0.07651177048683167, max_abs=1.0, mean_rel=0.03833732381463051, max_rel=278.86572265625, norm_rel=0.006281237117946148, ref_abs_avg=13.234946250915527, test_abs_avg=13.235311508178711
liger_forward grad[92] vs paper_forward: mean_abs=0.06000256538391113, max_abs=0.25, mean_rel=0.0249498188495636, max_rel=3.6014206409454346, norm_rel=0.005973211955279112, ref_abs_avg=11.109933853149414, test_abs_avg=11.114900588989258
liger_forward grad[93] vs paper_forward: mean_abs=0.07525902986526489, max_abs=0.75, mean_rel=0.035392649471759796, max_rel=125.80104064941406, norm_rel=0.006205196958035231, ref_abs_avg=13.098028182983398, test_abs_avg=13.09843921661377
liger_forward grad[94] vs paper_forward: mean_abs=0.07283966988325119, max_abs=1.0, mean_rel=0.032603636384010315, max_rel=96.80108642578125, norm_rel=0.00591775868088007, ref_abs_avg=13.382716178894043, test_abs_avg=13.383162498474121
liger_forward grad[95] vs paper_forward: mean_abs=0.058727413415908813, max_abs=0.25, mean_rel=0.018105048686265945, max_rel=0.9397565126419067, norm_rel=0.005855342373251915, ref_abs_avg=10.786245346069336, test_abs_avg=10.7831449508667
liger_forward grad[96] vs paper_forward: mean_abs=0.07068412005901337, max_abs=1.0, mean_rel=0.03353458270430565, max_rel=183.65260314941406, norm_rel=0.005947948899120092, ref_abs_avg=13.034200668334961, test_abs_avg=13.034337997436523
liger_forward grad[97] vs paper_forward: mean_abs=0.0675404965877533, max_abs=1.0, mean_rel=0.031636036932468414, max_rel=102.6986083984375, norm_rel=0.005872820969671011, ref_abs_avg=12.713068962097168, test_abs_avg=12.713129997253418
identity layers + randn queries
production_forward fwd+bwd:  33.818 ms
production_forward bwd-only: 28.937 ms
production_forward peak allocated: fwd=1.174 GiB, fwd+bwd=5.176 GiB
production_forward peak reserved:  fwd=1.246 GiB, fwd+bwd=5.246 GiB
liger_forward fwd+bwd:  54.692 ms
liger_forward bwd-only: 42.335 ms
liger_forward peak allocated: fwd=7.727 GiB, fwd+bwd=7.727 GiB
liger_forward peak reserved:  fwd=7.773 GiB, fwd+bwd=8.086 GiB
torch_compile_phases_forward fwd+bwd:  48.542 ms
torch_compile_phases_forward bwd-only: 39.278 ms
torch_compile_phases_forward peak allocated: fwd=6.470 GiB, fwd+bwd=6.784 GiB
torch_compile_phases_forward peak reserved:  fwd=6.627 GiB, fwd+bwd=8.752 GiB
paper_forward fwd+bwd:  112.844 ms
paper_forward bwd-only: 88.989 ms
paper_forward peak allocated: fwd=14.930 GiB, fwd+bwd=15.990 GiB
paper_forward peak reserved:  fwd=14.975 GiB, fwd+bwd=16.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001690475270152092, max_abs=0.03515625
production_forward grad[0] vs paper_forward: mean_abs=0.008570976555347443, max_abs=0.3671875, mean_rel=0.0723910927772522, max_rel=77.8880386352539, norm_rel=0.019774816930294037, ref_abs_avg=0.47123289108276367, test_abs_avg=0.4712415933609009
production_forward grad[1] vs paper_forward: mean_abs=5.300645351409912, max_abs=48.0, mean_rel=0.1909339874982834, max_rel=818.134521484375, norm_rel=0.020820321515202522, ref_abs_avg=231.5425262451172, test_abs_avg=231.6774139404297
production_forward grad[2] vs paper_forward: mean_abs=0.8614506721496582, max_abs=3.5, mean_rel=0.2847585678100586, max_rel=43.30634689331055, norm_rel=0.022422758862376213, ref_abs_avg=39.29471969604492, test_abs_avg=39.34194564819336
production_forward grad[3] vs paper_forward: mean_abs=1.094426155090332, max_abs=7.125, mean_rel=0.14825904369354248, max_rel=989.5122680664062, norm_rel=0.022965339943766594, ref_abs_avg=47.883628845214844, test_abs_avg=47.886962890625
production_forward grad[4] vs paper_forward: mean_abs=1.087675929069519, max_abs=7.0, mean_rel=0.16142481565475464, max_rel=929.33349609375, norm_rel=0.022995740175247192, ref_abs_avg=47.53989028930664, test_abs_avg=47.53086853027344
production_forward grad[5] vs paper_forward: mean_abs=0.7977457046508789, max_abs=3.375, mean_rel=0.09330551326274872, max_rel=6.526324272155762, norm_rel=0.022917114198207855, ref_abs_avg=34.11534118652344, test_abs_avg=34.129798889160156
production_forward grad[6] vs paper_forward: mean_abs=0.9563356041908264, max_abs=6.25, mean_rel=0.1440247893333435, max_rel=1619.44921875, norm_rel=0.022704534232616425, ref_abs_avg=42.341819763183594, test_abs_avg=42.347816467285156
production_forward grad[7] vs paper_forward: mean_abs=0.9305622577667236, max_abs=7.0, mean_rel=0.14062148332595825, max_rel=660.4015502929688, norm_rel=0.02228607051074505, ref_abs_avg=41.967689514160156, test_abs_avg=41.968902587890625
production_forward grad[8] vs paper_forward: mean_abs=0.7602231502532959, max_abs=3.0, mean_rel=0.14241524040699005, max_rel=21.052701950073242, norm_rel=0.023005211725831032, ref_abs_avg=32.527496337890625, test_abs_avg=32.50926971435547
production_forward grad[9] vs paper_forward: mean_abs=0.8746988773345947, max_abs=5.6875, mean_rel=0.15699094533920288, max_rel=855.0771484375, norm_rel=0.022520391270518303, ref_abs_avg=39.03321838378906, test_abs_avg=39.03398132324219
production_forward grad[10] vs paper_forward: mean_abs=0.8492975234985352, max_abs=5.625, mean_rel=0.15673226118087769, max_rel=884.0570068359375, norm_rel=0.022228511050343513, ref_abs_avg=38.433815002441406, test_abs_avg=38.441162109375
production_forward grad[11] vs paper_forward: mean_abs=0.6799142360687256, max_abs=2.5, mean_rel=0.08285795152187347, max_rel=8.4342679977417, norm_rel=0.022457994520664215, ref_abs_avg=30.580049514770508, test_abs_avg=30.60467529296875
production_forward grad[12] vs paper_forward: mean_abs=0.8053936958312988, max_abs=6.0, mean_rel=0.15335772931575775, max_rel=824.036376953125, norm_rel=0.022350085899233818, ref_abs_avg=36.194068908691406, test_abs_avg=36.195701599121094
production_forward grad[13] vs paper_forward: mean_abs=0.7920042872428894, max_abs=5.25, mean_rel=0.14686307311058044, max_rel=1294.099609375, norm_rel=0.022121960297226906, ref_abs_avg=35.99748992919922, test_abs_avg=35.995994567871094
production_forward grad[14] vs paper_forward: mean_abs=0.6474647521972656, max_abs=2.8125, mean_rel=0.08778533339500427, max_rel=6.352316856384277, norm_rel=0.02427869848906994, ref_abs_avg=27.546123504638672, test_abs_avg=27.555511474609375
production_forward grad[15] vs paper_forward: mean_abs=0.7544454336166382, max_abs=4.625, mean_rel=0.14330483973026276, max_rel=897.4396362304688, norm_rel=0.02219272404909134, ref_abs_avg=34.13508224487305, test_abs_avg=34.13498306274414
production_forward grad[16] vs paper_forward: mean_abs=0.7358930110931396, max_abs=4.5, mean_rel=0.1703377366065979, max_rel=1145.6697998046875, norm_rel=0.02196381241083145, ref_abs_avg=33.70772171020508, test_abs_avg=33.70921325683594
production_forward grad[17] vs paper_forward: mean_abs=0.569817066192627, max_abs=2.03125, mean_rel=0.07323342561721802, max_rel=5.171975135803223, norm_rel=0.022253064438700676, ref_abs_avg=25.627838134765625, test_abs_avg=25.684438705444336
production_forward grad[18] vs paper_forward: mean_abs=0.7106019258499146, max_abs=4.5625, mean_rel=0.13476772606372833, max_rel=894.1198120117188, norm_rel=0.022037969902157784, ref_abs_avg=32.40326690673828, test_abs_avg=32.40411376953125
production_forward grad[19] vs paper_forward: mean_abs=0.6899597644805908, max_abs=4.0, mean_rel=0.15059752762317657, max_rel=872.10107421875, norm_rel=0.021890318021178246, ref_abs_avg=31.68439483642578, test_abs_avg=31.68685531616211
production_forward grad[20] vs paper_forward: mean_abs=0.5591316223144531, max_abs=2.625, mean_rel=0.0628817081451416, max_rel=1.7689175605773926, norm_rel=0.022959738969802856, ref_abs_avg=25.144874572753906, test_abs_avg=25.172405242919922
production_forward grad[21] vs paper_forward: mean_abs=0.6674083471298218, max_abs=4.25, mean_rel=0.14965105056762695, max_rel=1201.9046630859375, norm_rel=0.022070297971367836, ref_abs_avg=30.394535064697266, test_abs_avg=30.398353576660156
production_forward grad[22] vs paper_forward: mean_abs=0.6485824584960938, max_abs=4.32421875, mean_rel=0.15414930880069733, max_rel=663.6445922851562, norm_rel=0.021710487082600594, ref_abs_avg=30.046524047851562, test_abs_avg=30.047771453857422
production_forward grad[23] vs paper_forward: mean_abs=0.527379035949707, max_abs=2.0, mean_rel=0.1185152679681778, max_rel=16.02564239501953, norm_rel=0.022461093962192535, ref_abs_avg=23.58572006225586, test_abs_avg=23.59735679626465
production_forward grad[24] vs paper_forward: mean_abs=0.6353780031204224, max_abs=4.203125, mean_rel=0.14426113665103912, max_rel=1364.50732421875, norm_rel=0.021855253726243973, ref_abs_avg=29.22460174560547, test_abs_avg=29.225990295410156
production_forward grad[25] vs paper_forward: mean_abs=0.6189099550247192, max_abs=4.0, mean_rel=0.14462053775787354, max_rel=880.8927612304688, norm_rel=0.02147924341261387, ref_abs_avg=28.95445442199707, test_abs_avg=28.955738067626953
production_forward grad[26] vs paper_forward: mean_abs=0.6075420379638672, max_abs=3.0, mean_rel=0.09285029023885727, max_rel=3.2811179161071777, norm_rel=0.02489343099296093, ref_abs_avg=24.804443359375, test_abs_avg=24.754703521728516
production_forward grad[27] vs paper_forward: mean_abs=0.7313904762268066, max_abs=4.75, mean_rel=0.15579360723495483, max_rel=1012.4486083984375, norm_rel=0.023655705153942108, ref_abs_avg=31.06580924987793, test_abs_avg=31.06732940673828
production_forward grad[28] vs paper_forward: mean_abs=0.7152670621871948, max_abs=5.0, mean_rel=0.15986377000808716, max_rel=1183.0469970703125, norm_rel=0.02341155894100666, ref_abs_avg=30.675973892211914, test_abs_avg=30.67720603942871
production_forward grad[29] vs paper_forward: mean_abs=0.5544724464416504, max_abs=2.15625, mean_rel=0.11090049147605896, max_rel=13.696045875549316, norm_rel=0.024788973852992058, ref_abs_avg=22.298418045043945, test_abs_avg=22.262657165527344
production_forward grad[30] vs paper_forward: mean_abs=0.6763602495193481, max_abs=4.75, mean_rel=0.1663222312927246, max_rel=1247.373046875, norm_rel=0.024032967165112495, ref_abs_avg=28.24618911743164, test_abs_avg=28.24532127380371
production_forward grad[31] vs paper_forward: mean_abs=0.6679927110671997, max_abs=4.625, mean_rel=0.1492256075143814, max_rel=817.724365234375, norm_rel=0.024042466655373573, ref_abs_avg=27.879772186279297, test_abs_avg=27.88283920288086
production_forward grad[32] vs paper_forward: mean_abs=0.5203452706336975, max_abs=1.875, mean_rel=0.8550171852111816, max_rel=401.00054931640625, norm_rel=0.025349298492074013, ref_abs_avg=20.547639846801758, test_abs_avg=20.575809478759766
production_forward grad[33] vs paper_forward: mean_abs=0.6330138444900513, max_abs=3.875, mean_rel=0.15134698152542114, max_rel=842.0806274414062, norm_rel=0.02390635944902897, ref_abs_avg=26.577991485595703, test_abs_avg=26.5789794921875
production_forward grad[34] vs paper_forward: mean_abs=0.6245663166046143, max_abs=3.7890625, mean_rel=0.15901173651218414, max_rel=1385.230712890625, norm_rel=0.023645823821425438, ref_abs_avg=26.47996711730957, test_abs_avg=26.474416732788086
production_forward grad[35] vs paper_forward: mean_abs=0.5023608207702637, max_abs=2.0, mean_rel=0.24016335606575012, max_rel=66.75778198242188, norm_rel=0.02391485683619976, ref_abs_avg=20.743974685668945, test_abs_avg=20.727828979492188
production_forward grad[36] vs paper_forward: mean_abs=0.5959252119064331, max_abs=4.125, mean_rel=0.16261324286460876, max_rel=1660.19189453125, norm_rel=0.023695513606071472, ref_abs_avg=25.19529151916504, test_abs_avg=25.194828033447266
production_forward grad[37] vs paper_forward: mean_abs=0.585046112537384, max_abs=4.0, mean_rel=0.1565060019493103, max_rel=727.0696411132812, norm_rel=0.023743562400341034, ref_abs_avg=24.69623374938965, test_abs_avg=24.694599151611328
production_forward grad[38] vs paper_forward: mean_abs=0.4753778874874115, max_abs=2.140625, mean_rel=0.09324716031551361, max_rel=5.400478839874268, norm_rel=0.02282828465104103, ref_abs_avg=20.605052947998047, test_abs_avg=20.634899139404297
production_forward grad[39] vs paper_forward: mean_abs=0.5650777816772461, max_abs=4.0625, mean_rel=0.14835119247436523, max_rel=790.3250732421875, norm_rel=0.02346617355942726, ref_abs_avg=24.128231048583984, test_abs_avg=24.127384185791016
production_forward grad[40] vs paper_forward: mean_abs=0.5531201362609863, max_abs=3.25, mean_rel=0.15211839973926544, max_rel=736.14892578125, norm_rel=0.02337430603802204, ref_abs_avg=23.667037963867188, test_abs_avg=23.667957305908203
production_forward grad[41] vs paper_forward: mean_abs=0.4284294843673706, max_abs=1.4375, mean_rel=0.081550233066082, max_rel=10.133654594421387, norm_rel=0.022718703374266624, ref_abs_avg=18.750926971435547, test_abs_avg=18.76256561279297
production_forward grad[42] vs paper_forward: mean_abs=0.5375592708587646, max_abs=3.5, mean_rel=0.1460368037223816, max_rel=729.0755615234375, norm_rel=0.02332920953631401, ref_abs_avg=23.082015991210938, test_abs_avg=23.08270263671875
production_forward grad[43] vs paper_forward: mean_abs=0.5282965302467346, max_abs=3.375, mean_rel=0.15483126044273376, max_rel=632.8642578125, norm_rel=0.023295238614082336, ref_abs_avg=22.717052459716797, test_abs_avg=22.72325897216797
production_forward grad[44] vs paper_forward: mean_abs=0.4014625549316406, max_abs=1.3125, mean_rel=0.10287489742040634, max_rel=5.937451362609863, norm_rel=0.021545663475990295, ref_abs_avg=18.331256866455078, test_abs_avg=18.335163116455078
production_forward grad[45] vs paper_forward: mean_abs=0.5152339935302734, max_abs=3.5, mean_rel=0.15192024409770966, max_rel=567.5462646484375, norm_rel=0.023005187511444092, ref_abs_avg=22.41339111328125, test_abs_avg=22.413753509521484
production_forward grad[46] vs paper_forward: mean_abs=0.5012656450271606, max_abs=3.25, mean_rel=0.15178854763507843, max_rel=827.2711791992188, norm_rel=0.022958941757678986, ref_abs_avg=21.86574935913086, test_abs_avg=21.86042022705078
production_forward grad[47] vs paper_forward: mean_abs=0.40000036358833313, max_abs=1.5, mean_rel=0.2597059905529022, max_rel=68.85807800292969, norm_rel=0.022633425891399384, ref_abs_avg=17.220752716064453, test_abs_avg=17.192970275878906
production_forward grad[48] vs paper_forward: mean_abs=0.48767346143722534, max_abs=3.125, mean_rel=0.1511002779006958, max_rel=934.7083129882812, norm_rel=0.022917957976460457, ref_abs_avg=21.26679229736328, test_abs_avg=21.26723861694336
production_forward grad[49] vs paper_forward: mean_abs=0.482260525226593, max_abs=3.0, mean_rel=0.1478661447763443, max_rel=654.7545776367188, norm_rel=0.022804809734225273, ref_abs_avg=21.175899505615234, test_abs_avg=21.178245544433594
production_forward grad[50] vs paper_forward: mean_abs=0.4494442939758301, max_abs=1.78125, mean_rel=0.17048004269599915, max_rel=15.62367057800293, norm_rel=0.023539800196886063, ref_abs_avg=18.568939208984375, test_abs_avg=18.541259765625
production_forward grad[51] vs paper_forward: mean_abs=0.5546921491622925, max_abs=4.0, mean_rel=0.153474360704422, max_rel=899.8161010742188, norm_rel=0.02442452311515808, ref_abs_avg=22.781658172607422, test_abs_avg=22.7840576171875
production_forward grad[52] vs paper_forward: mean_abs=0.5398067235946655, max_abs=3.75, mean_rel=0.15184006094932556, max_rel=1358.7747802734375, norm_rel=0.024166785180568695, ref_abs_avg=22.42728042602539, test_abs_avg=22.426395416259766
production_forward grad[53] vs paper_forward: mean_abs=0.41933417320251465, max_abs=1.515625, mean_rel=0.07918809354305267, max_rel=3.8421010971069336, norm_rel=0.024643387645483017, ref_abs_avg=16.968997955322266, test_abs_avg=16.922523498535156
production_forward grad[54] vs paper_forward: mean_abs=0.5134353041648865, max_abs=3.6875, mean_rel=0.16069422662258148, max_rel=1242.1949462890625, norm_rel=0.024175813421607018, ref_abs_avg=21.264896392822266, test_abs_avg=21.267391204833984
production_forward grad[55] vs paper_forward: mean_abs=0.5053951740264893, max_abs=3.375, mean_rel=0.14812442660331726, max_rel=1354.212890625, norm_rel=0.024157343432307243, ref_abs_avg=20.952611923217773, test_abs_avg=20.951570510864258
production_forward grad[56] vs paper_forward: mean_abs=0.3729069232940674, max_abs=1.625, mean_rel=0.19132086634635925, max_rel=32.623870849609375, norm_rel=0.02298871800303459, ref_abs_avg=16.623029708862305, test_abs_avg=16.630329132080078
production_forward grad[57] vs paper_forward: mean_abs=0.4733504056930542, max_abs=3.25, mean_rel=0.15824219584465027, max_rel=1260.631591796875, norm_rel=0.023522820323705673, ref_abs_avg=20.099353790283203, test_abs_avg=20.100635528564453
production_forward grad[58] vs paper_forward: mean_abs=0.4667929410934448, max_abs=3.5, mean_rel=0.1659243106842041, max_rel=1539.1846923828125, norm_rel=0.02328413724899292, ref_abs_avg=20.01338768005371, test_abs_avg=20.01519775390625
production_forward grad[59] vs paper_forward: mean_abs=0.3765087127685547, max_abs=1.5, mean_rel=0.0818837434053421, max_rel=3.6139800548553467, norm_rel=0.02274899370968342, ref_abs_avg=16.565650939941406, test_abs_avg=16.556001663208008
production_forward grad[60] vs paper_forward: mean_abs=0.4434407651424408, max_abs=3.125, mean_rel=0.1546325981616974, max_rel=872.2398681640625, norm_rel=0.023117853328585625, ref_abs_avg=19.161151885986328, test_abs_avg=19.16143035888672
production_forward grad[61] vs paper_forward: mean_abs=0.4357437193393707, max_abs=3.0, mean_rel=0.16135349869728088, max_rel=1227.355224609375, norm_rel=0.02273339219391346, ref_abs_avg=19.140321731567383, test_abs_avg=19.142894744873047
production_forward grad[62] vs paper_forward: mean_abs=0.3452978730201721, max_abs=1.5, mean_rel=0.2103254795074463, max_rel=41.66666793823242, norm_rel=0.022999538108706474, ref_abs_avg=15.238761901855469, test_abs_avg=15.241233825683594
production_forward grad[63] vs paper_forward: mean_abs=0.42104819416999817, max_abs=3.0, mean_rel=0.15693800151348114, max_rel=866.2705688476562, norm_rel=0.022784102708101273, ref_abs_avg=18.459152221679688, test_abs_avg=18.459388732910156
production_forward grad[64] vs paper_forward: mean_abs=0.40816813707351685, max_abs=2.75, mean_rel=0.14106279611587524, max_rel=790.7391357421875, norm_rel=0.02232878841459751, ref_abs_avg=18.289257049560547, test_abs_avg=18.286624908447266
production_forward grad[65] vs paper_forward: mean_abs=0.33008623123168945, max_abs=1.25, mean_rel=0.06667020916938782, max_rel=2.5121231079101562, norm_rel=0.021770255640149117, ref_abs_avg=15.10603141784668, test_abs_avg=15.125581741333008
production_forward grad[66] vs paper_forward: mean_abs=0.3951418995857239, max_abs=2.75, mean_rel=0.14592546224594116, max_rel=513.4931640625, norm_rel=0.022405674681067467, ref_abs_avg=17.621633529663086, test_abs_avg=17.622455596923828
production_forward grad[67] vs paper_forward: mean_abs=0.38776183128356934, max_abs=3.5, mean_rel=0.14442403614521027, max_rel=550.980712890625, norm_rel=0.021935921162366867, ref_abs_avg=17.68488121032715, test_abs_avg=17.68755340576172
production_forward grad[68] vs paper_forward: mean_abs=0.3015986680984497, max_abs=1.875, mean_rel=0.14428699016571045, max_rel=14.149253845214844, norm_rel=0.021866753697395325, ref_abs_avg=14.100732803344727, test_abs_avg=14.105449676513672
production_forward grad[69] vs paper_forward: mean_abs=0.37828975915908813, max_abs=2.90625, mean_rel=0.13625964522361755, max_rel=597.0001220703125, norm_rel=0.021877022460103035, ref_abs_avg=17.246339797973633, test_abs_avg=17.24660873413086
production_forward grad[70] vs paper_forward: mean_abs=0.37341922521591187, max_abs=2.75, mean_rel=0.14566966891288757, max_rel=775.0009765625, norm_rel=0.022197594866156578, ref_abs_avg=16.84173583984375, test_abs_avg=16.835556030273438
production_forward grad[71] vs paper_forward: mean_abs=0.30233311653137207, max_abs=1.125, mean_rel=0.15163981914520264, max_rel=37.40860366821289, norm_rel=0.020724916830658913, ref_abs_avg=14.514824867248535, test_abs_avg=14.477861404418945
production_forward grad[72] vs paper_forward: mean_abs=0.36122027039527893, max_abs=2.9375, mean_rel=0.1395314335823059, max_rel=724.7791748046875, norm_rel=0.02172132395207882, ref_abs_avg=16.60662841796875, test_abs_avg=16.607032775878906
production_forward grad[73] vs paper_forward: mean_abs=0.3545610308647156, max_abs=2.625, mean_rel=0.14543865621089935, max_rel=662.6400146484375, norm_rel=0.021461691707372665, ref_abs_avg=16.490877151489258, test_abs_avg=16.48583984375
production_forward grad[74] vs paper_forward: mean_abs=0.3317418098449707, max_abs=1.35546875, mean_rel=0.11659355461597443, max_rel=16.503849029541016, norm_rel=0.022242262959480286, ref_abs_avg=14.833686828613281, test_abs_avg=14.798373222351074
production_forward grad[75] vs paper_forward: mean_abs=0.4030449390411377, max_abs=3.0625, mean_rel=0.15262313187122345, max_rel=921.888671875, norm_rel=0.022584130987524986, ref_abs_avg=17.82378578186035, test_abs_avg=17.82272720336914
production_forward grad[76] vs paper_forward: mean_abs=0.3879932463169098, max_abs=3.0, mean_rel=0.14586150646209717, max_rel=647.109619140625, norm_rel=0.022295314818620682, ref_abs_avg=17.395557403564453, test_abs_avg=17.39710235595703
production_forward grad[77] vs paper_forward: mean_abs=0.32317399978637695, max_abs=1.25, mean_rel=0.08348828554153442, max_rel=4.066248893737793, norm_rel=0.021968578919768333, ref_abs_avg=14.455564498901367, test_abs_avg=14.456766128540039
production_forward grad[78] vs paper_forward: mean_abs=0.377069354057312, max_abs=2.75, mean_rel=0.1464158147573471, max_rel=694.0805053710938, norm_rel=0.022326279431581497, ref_abs_avg=16.881980895996094, test_abs_avg=16.881954193115234
production_forward grad[79] vs paper_forward: mean_abs=0.36752673983573914, max_abs=3.625, mean_rel=0.13886189460754395, max_rel=530.02490234375, norm_rel=0.02198733016848564, ref_abs_avg=16.68224334716797, test_abs_avg=16.6824893951416
production_forward grad[80] vs paper_forward: mean_abs=0.27992820739746094, max_abs=1.375, mean_rel=0.08312481641769409, max_rel=3.6875882148742676, norm_rel=0.020332181826233864, ref_abs_avg=14.060365676879883, test_abs_avg=14.081619262695312
production_forward grad[81] vs paper_forward: mean_abs=0.343991756439209, max_abs=3.0, mean_rel=0.1314064860343933, max_rel=445.3124694824219, norm_rel=0.02149416320025921, ref_abs_avg=15.997983932495117, test_abs_avg=15.99829387664795
production_forward grad[82] vs paper_forward: mean_abs=0.3345978856086731, max_abs=3.31640625, mean_rel=0.15331806242465973, max_rel=784.5194091796875, norm_rel=0.021323129534721375, ref_abs_avg=15.668106079101562, test_abs_avg=15.672796249389648
production_forward grad[83] vs paper_forward: mean_abs=0.2783527374267578, max_abs=1.25, mean_rel=0.07127592712640762, max_rel=1.741286039352417, norm_rel=0.022179143503308296, ref_abs_avg=12.285759925842285, test_abs_avg=12.288777351379395
production_forward grad[84] vs paper_forward: mean_abs=0.320809543132782, max_abs=2.9375, mean_rel=0.12970882654190063, max_rel=441.4902038574219, norm_rel=0.02114214561879635, ref_abs_avg=15.17086410522461, test_abs_avg=15.170442581176758
production_forward grad[85] vs paper_forward: mean_abs=0.31759488582611084, max_abs=3.25, mean_rel=0.1317298412322998, max_rel=474.2469482421875, norm_rel=0.021007901057600975, ref_abs_avg=15.097986221313477, test_abs_avg=15.096528053283691
production_forward grad[86] vs paper_forward: mean_abs=0.24786782264709473, max_abs=1.125, mean_rel=0.14709049463272095, max_rel=28.959569931030273, norm_rel=0.02109578624367714, ref_abs_avg=11.788803100585938, test_abs_avg=11.79942512512207
production_forward grad[87] vs paper_forward: mean_abs=0.2945929169654846, max_abs=2.5, mean_rel=0.12710431218147278, max_rel=684.0060424804688, norm_rel=0.02048138901591301, ref_abs_avg=14.420707702636719, test_abs_avg=14.419408798217773
production_forward grad[88] vs paper_forward: mean_abs=0.29204726219177246, max_abs=3.25, mean_rel=0.13711011409759521, max_rel=762.053466796875, norm_rel=0.020291093736886978, ref_abs_avg=14.473323822021484, test_abs_avg=14.474367141723633
production_forward grad[89] vs paper_forward: mean_abs=0.2430720329284668, max_abs=0.96875, mean_rel=0.058988094329833984, max_rel=1.688708782196045, norm_rel=0.019784990698099136, ref_abs_avg=12.211177825927734, test_abs_avg=12.209434509277344
production_forward grad[90] vs paper_forward: mean_abs=0.2907910943031311, max_abs=3.25, mean_rel=0.11682219803333282, max_rel=581.4277954101562, norm_rel=0.020102698355913162, ref_abs_avg=14.53785514831543, test_abs_avg=14.5371675491333
production_forward grad[91] vs paper_forward: mean_abs=0.2794322967529297, max_abs=3.0, mean_rel=0.13166500627994537, max_rel=634.4935302734375, norm_rel=0.02024161070585251, ref_abs_avg=13.968740463256836, test_abs_avg=13.961466789245605
production_forward grad[92] vs paper_forward: mean_abs=0.22354957461357117, max_abs=0.79296875, mean_rel=0.12007386982440948, max_rel=29.793378829956055, norm_rel=0.01979437656700611, ref_abs_avg=11.221468925476074, test_abs_avg=11.22700309753418
production_forward grad[93] vs paper_forward: mean_abs=0.2647807002067566, max_abs=2.5, mean_rel=0.11718359589576721, max_rel=515.5100708007812, norm_rel=0.0193787869066, ref_abs_avg=13.809428215026855, test_abs_avg=13.807493209838867
production_forward grad[94] vs paper_forward: mean_abs=0.252419114112854, max_abs=3.0, mean_rel=0.11324869841337204, max_rel=331.4426574707031, norm_rel=0.0182622242718935, ref_abs_avg=13.865254402160645, test_abs_avg=13.866292953491211
production_forward grad[95] vs paper_forward: mean_abs=0.21855628490447998, max_abs=0.8125, mean_rel=0.20438705384731293, max_rel=37.442787170410156, norm_rel=0.019755858927965164, ref_abs_avg=11.103204727172852, test_abs_avg=11.100140571594238
production_forward grad[96] vs paper_forward: mean_abs=0.2514793574810028, max_abs=3.28125, mean_rel=0.11650516092777252, max_rel=612.9004516601562, norm_rel=0.018969641998410225, ref_abs_avg=13.483221054077148, test_abs_avg=13.48194408416748
production_forward grad[97] vs paper_forward: mean_abs=0.2418096512556076, max_abs=3.25, mean_rel=0.11434432119131088, max_rel=306.5103759765625, norm_rel=0.01869233138859272, ref_abs_avg=13.206521987915039, test_abs_avg=13.211179733276367
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016944496892392635, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008926576003432274, max_abs=0.35546875, mean_rel=0.07510532438755035, max_rel=70.2012710571289, norm_rel=0.020458746701478958, ref_abs_avg=0.47123289108276367, test_abs_avg=0.4712306559085846
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=5.381097316741943, max_abs=40.0, mean_rel=0.1860167235136032, max_rel=774.7085571289062, norm_rel=0.02102069929242134, ref_abs_avg=231.5425262451172, test_abs_avg=231.6988983154297
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=0.9321243762969971, max_abs=3.25, mean_rel=0.26414725184440613, max_rel=44.3630256652832, norm_rel=0.023579047992825508, ref_abs_avg=39.29471969604492, test_abs_avg=39.306983947753906
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.1330323219299316, max_abs=7.5, mean_rel=0.15459510684013367, max_rel=1152.6729736328125, norm_rel=0.0237728264182806, ref_abs_avg=47.883628845214844, test_abs_avg=47.882530212402344
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.1204884052276611, max_abs=7.5, mean_rel=0.17041143774986267, max_rel=968.4952392578125, norm_rel=0.023671014234423637, ref_abs_avg=47.53989028930664, test_abs_avg=47.52757263183594
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=0.81591796875, max_abs=3.375, mean_rel=0.11187253892421722, max_rel=9.977811813354492, norm_rel=0.024086201563477516, ref_abs_avg=34.11534118652344, test_abs_avg=34.134368896484375
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=0.9866379499435425, max_abs=6.5, mean_rel=0.15048712491989136, max_rel=1468.665283203125, norm_rel=0.023415809497237206, ref_abs_avg=42.341819763183594, test_abs_avg=42.34637451171875
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=0.9593182802200317, max_abs=6.5, mean_rel=0.15042522549629211, max_rel=599.3362426757812, norm_rel=0.022954629734158516, ref_abs_avg=41.967689514160156, test_abs_avg=41.969905853271484
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.7190358638763428, max_abs=2.5, mean_rel=0.12137465178966522, max_rel=24.121646881103516, norm_rel=0.02199738845229149, ref_abs_avg=32.527496337890625, test_abs_avg=32.51878356933594
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=0.901684045791626, max_abs=5.5, mean_rel=0.15953896939754486, max_rel=1007.8409423828125, norm_rel=0.02320139668881893, ref_abs_avg=39.03321838378906, test_abs_avg=39.0330810546875
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=0.8760257959365845, max_abs=5.5, mean_rel=0.1625828891992569, max_rel=974.3453979492188, norm_rel=0.0229189395904541, ref_abs_avg=38.433815002441406, test_abs_avg=38.439781188964844
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.7103052139282227, max_abs=2.75, mean_rel=0.08676819503307343, max_rel=8.777066230773926, norm_rel=0.02330789342522621, ref_abs_avg=30.580049514770508, test_abs_avg=30.620166778564453
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=0.8281563520431519, max_abs=5.5, mean_rel=0.1550895720720291, max_rel=1173.425537109375, norm_rel=0.02300153858959675, ref_abs_avg=36.194068908691406, test_abs_avg=36.19480895996094
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=0.8134065866470337, max_abs=5.5, mean_rel=0.14863431453704834, max_rel=1505.3206787109375, norm_rel=0.022722382098436356, ref_abs_avg=35.99748992919922, test_abs_avg=35.9996452331543
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.6830422878265381, max_abs=2.625, mean_rel=0.08091332018375397, max_rel=4.072681427001953, norm_rel=0.024974623695015907, ref_abs_avg=27.546123504638672, test_abs_avg=27.516077041625977
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=0.7751514911651611, max_abs=4.75, mean_rel=0.14869356155395508, max_rel=1024.203857421875, norm_rel=0.02279958315193653, ref_abs_avg=34.13508224487305, test_abs_avg=34.13383483886719
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.756636381149292, max_abs=5.0, mean_rel=0.17082341015338898, max_rel=978.8071899414062, norm_rel=0.02258099801838398, ref_abs_avg=33.70772171020508, test_abs_avg=33.710594177246094
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.5820715427398682, max_abs=2.125, mean_rel=0.07760363817214966, max_rel=4.894595623016357, norm_rel=0.022706134244799614, ref_abs_avg=25.627838134765625, test_abs_avg=25.6728515625
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=0.7290231585502625, max_abs=4.53125, mean_rel=0.13923555612564087, max_rel=1141.42236328125, norm_rel=0.02258671261370182, ref_abs_avg=32.40326690673828, test_abs_avg=32.40475845336914
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.7080036997795105, max_abs=4.25, mean_rel=0.15878477692604065, max_rel=832.5831909179688, norm_rel=0.022462904453277588, ref_abs_avg=31.68439483642578, test_abs_avg=31.685680389404297
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.5590572357177734, max_abs=2.6875, mean_rel=0.06286933273077011, max_rel=3.0759217739105225, norm_rel=0.022805508226156235, ref_abs_avg=25.144874572753906, test_abs_avg=25.172090530395508
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.6840457916259766, max_abs=4.375, mean_rel=0.15160374343395233, max_rel=1168.5263671875, norm_rel=0.022595489397644997, ref_abs_avg=30.394535064697266, test_abs_avg=30.398279190063477
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.6661180853843689, max_abs=4.3125, mean_rel=0.15657532215118408, max_rel=1211.683349609375, norm_rel=0.022280598059296608, ref_abs_avg=30.046524047851562, test_abs_avg=30.047931671142578
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.544661819934845, max_abs=2.5, mean_rel=0.11791672557592392, max_rel=17.295000076293945, norm_rel=0.023400641977787018, ref_abs_avg=23.58572006225586, test_abs_avg=23.570404052734375
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.6492582559585571, max_abs=4.578125, mean_rel=0.1474703848361969, max_rel=1576.569580078125, norm_rel=0.022339366376399994, ref_abs_avg=29.22460174560547, test_abs_avg=29.225852966308594
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.6331408023834229, max_abs=3.796875, mean_rel=0.14808997511863708, max_rel=879.7185668945312, norm_rel=0.021948302164673805, ref_abs_avg=28.95445442199707, test_abs_avg=28.956172943115234
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.6276688575744629, max_abs=2.4375, mean_rel=0.10099680721759796, max_rel=4.322821140289307, norm_rel=0.02515052631497383, ref_abs_avg=24.804443359375, test_abs_avg=24.757946014404297
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=0.7505786418914795, max_abs=4.625, mean_rel=0.1555989384651184, max_rel=1177.1846923828125, norm_rel=0.024258635938167572, ref_abs_avg=31.06580924987793, test_abs_avg=31.06717300415039
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.7340155839920044, max_abs=4.5, mean_rel=0.16416576504707336, max_rel=1042.8360595703125, norm_rel=0.02398146130144596, ref_abs_avg=30.675973892211914, test_abs_avg=30.677614212036133
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.5704813003540039, max_abs=2.330078125, mean_rel=0.09885143488645554, max_rel=15.004024505615234, norm_rel=0.02580329030752182, ref_abs_avg=22.298418045043945, test_abs_avg=22.258392333984375
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.6926225423812866, max_abs=4.25, mean_rel=0.16939061880111694, max_rel=853.4279174804688, norm_rel=0.024585682898759842, ref_abs_avg=28.24618911743164, test_abs_avg=28.244178771972656
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.6850306987762451, max_abs=4.5, mean_rel=0.15241990983486176, max_rel=894.77978515625, norm_rel=0.024656040593981743, ref_abs_avg=27.879772186279297, test_abs_avg=27.88215446472168
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.5217934250831604, max_abs=1.96875, mean_rel=0.7034094929695129, max_rel=318.6611022949219, norm_rel=0.025362545624375343, ref_abs_avg=20.547639846801758, test_abs_avg=20.560468673706055
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.646273136138916, max_abs=4.125, mean_rel=0.15544387698173523, max_rel=628.5491333007812, norm_rel=0.02439962513744831, ref_abs_avg=26.577991485595703, test_abs_avg=26.578773498535156
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.6376310586929321, max_abs=4.0625, mean_rel=0.15923762321472168, max_rel=1330.0924072265625, norm_rel=0.024137821048498154, ref_abs_avg=26.47996711730957, test_abs_avg=26.473278045654297
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.5074553489685059, max_abs=2.01953125, mean_rel=0.1817236989736557, max_rel=41.94701385498047, norm_rel=0.024044303223490715, ref_abs_avg=20.743974685668945, test_abs_avg=20.72000503540039
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.608192503452301, max_abs=3.96875, mean_rel=0.1701989471912384, max_rel=1895.5628662109375, norm_rel=0.02419091761112213, ref_abs_avg=25.19529151916504, test_abs_avg=25.194847106933594
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.5971099734306335, max_abs=3.875, mean_rel=0.15695475041866302, max_rel=717.6194458007812, norm_rel=0.02422506920993328, ref_abs_avg=24.69623374938965, test_abs_avg=24.695653915405273
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.49891090393066406, max_abs=2.2265625, mean_rel=0.09911487996578217, max_rel=5.2038469314575195, norm_rel=0.02399006113409996, ref_abs_avg=20.605052947998047, test_abs_avg=20.65930938720703
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.5752153396606445, max_abs=3.8125, mean_rel=0.15317904949188232, max_rel=843.4246215820312, norm_rel=0.023889046162366867, ref_abs_avg=24.128231048583984, test_abs_avg=24.12807846069336
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.5623699426651001, max_abs=3.5, mean_rel=0.15453079342842102, max_rel=731.0691528320312, norm_rel=0.023789379745721817, ref_abs_avg=23.667037963867188, test_abs_avg=23.66790008544922
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.4440650939941406, max_abs=1.75, mean_rel=0.08592021465301514, max_rel=6.737510681152344, norm_rel=0.023538880050182343, ref_abs_avg=18.750926971435547, test_abs_avg=18.76910400390625
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.5458951592445374, max_abs=4.25, mean_rel=0.14845606684684753, max_rel=637.0438232421875, norm_rel=0.02368454821407795, ref_abs_avg=23.082015991210938, test_abs_avg=23.082969665527344
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.5377035140991211, max_abs=3.75, mean_rel=0.15734046697616577, max_rel=471.9398193359375, norm_rel=0.023711180314421654, ref_abs_avg=22.717052459716797, test_abs_avg=22.724346160888672
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.4274125099182129, max_abs=1.625, mean_rel=0.12331149727106094, max_rel=8.915974617004395, norm_rel=0.023016182705760002, ref_abs_avg=18.331256866455078, test_abs_avg=18.336734771728516
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.5228220224380493, max_abs=3.5625, mean_rel=0.15175437927246094, max_rel=623.1726684570312, norm_rel=0.02334032952785492, ref_abs_avg=22.41339111328125, test_abs_avg=22.414020538330078
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.5090471506118774, max_abs=3.15625, mean_rel=0.15310733020305634, max_rel=860.3681640625, norm_rel=0.023296864703297615, ref_abs_avg=21.86574935913086, test_abs_avg=21.8603458404541
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.41517525911331177, max_abs=1.5, mean_rel=0.12219288945198059, max_rel=10.140774726867676, norm_rel=0.023401405662298203, ref_abs_avg=17.220752716064453, test_abs_avg=17.196155548095703
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.49464839696884155, max_abs=2.921875, mean_rel=0.1517329216003418, max_rel=675.8805541992188, norm_rel=0.023249711841344833, ref_abs_avg=21.26679229736328, test_abs_avg=21.267505645751953
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.4909956753253937, max_abs=3.25, mean_rel=0.14904366433620453, max_rel=726.0482788085938, norm_rel=0.023214884102344513, ref_abs_avg=21.175899505615234, test_abs_avg=21.17997932434082
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.45229572057724, max_abs=1.8125, mean_rel=0.16231724619865417, max_rel=12.050114631652832, norm_rel=0.024015069007873535, ref_abs_avg=18.568939208984375, test_abs_avg=18.52191162109375
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.5639740228652954, max_abs=3.9375, mean_rel=0.1604604870080948, max_rel=737.658935546875, norm_rel=0.024834612384438515, ref_abs_avg=22.781658172607422, test_abs_avg=22.783109664916992
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.5480839014053345, max_abs=4.25, mean_rel=0.15601250529289246, max_rel=1432.8782958984375, norm_rel=0.024549322202801704, ref_abs_avg=22.42728042602539, test_abs_avg=22.424266815185547
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.4324314594268799, max_abs=1.875, mean_rel=0.09168151766061783, max_rel=6.144999980926514, norm_rel=0.025085654109716415, ref_abs_avg=16.968997955322266, test_abs_avg=16.933547973632812
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.520384669303894, max_abs=3.75, mean_rel=0.15737640857696533, max_rel=1166.2716064453125, norm_rel=0.02449023723602295, ref_abs_avg=21.264896392822266, test_abs_avg=21.267120361328125
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.5099042654037476, max_abs=3.25, mean_rel=0.1535586416721344, max_rel=930.0496826171875, norm_rel=0.02438201569020748, ref_abs_avg=20.952611923217773, test_abs_avg=20.9503173828125
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.3740088939666748, max_abs=1.625, mean_rel=0.16785073280334473, max_rel=28.857393264770508, norm_rel=0.02350178174674511, ref_abs_avg=16.623029708862305, test_abs_avg=16.644058227539062
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.47949492931365967, max_abs=3.3125, mean_rel=0.16243794560432434, max_rel=1027.861328125, norm_rel=0.023834697902202606, ref_abs_avg=20.099353790283203, test_abs_avg=20.10054588317871
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.47267234325408936, max_abs=4.0, mean_rel=0.16716784238815308, max_rel=1022.4725952148438, norm_rel=0.02356715127825737, ref_abs_avg=20.01338768005371, test_abs_avg=20.014728546142578
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.3755011558532715, max_abs=1.328125, mean_rel=0.07495265454053879, max_rel=3.1610565185546875, norm_rel=0.02258129231631756, ref_abs_avg=16.565650939941406, test_abs_avg=16.534915924072266
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.44903677701950073, max_abs=3.25, mean_rel=0.15772177278995514, max_rel=977.3783569335938, norm_rel=0.023402336984872818, ref_abs_avg=19.161151885986328, test_abs_avg=19.16128158569336
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.4418019652366638, max_abs=3.0, mean_rel=0.16610509157180786, max_rel=1353.1107177734375, norm_rel=0.023053640499711037, ref_abs_avg=19.140321731567383, test_abs_avg=19.142292022705078
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.3413695693016052, max_abs=1.25, mean_rel=0.12775388360023499, max_rel=26.49936294555664, norm_rel=0.022576088085770607, ref_abs_avg=15.238761901855469, test_abs_avg=15.240143775939941
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.4265352487564087, max_abs=3.0, mean_rel=0.15906170010566711, max_rel=882.8873901367188, norm_rel=0.023056725040078163, ref_abs_avg=18.459152221679688, test_abs_avg=18.459482192993164
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.4137621819972992, max_abs=2.8125, mean_rel=0.14914250373840332, max_rel=874.2982177734375, norm_rel=0.02262169122695923, ref_abs_avg=18.289257049560547, test_abs_avg=18.287033081054688
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.32759952545166016, max_abs=1.28125, mean_rel=0.07293924689292908, max_rel=3.2182672023773193, norm_rel=0.02199607715010643, ref_abs_avg=15.10603141784668, test_abs_avg=15.142138481140137
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.39859190583229065, max_abs=3.0, mean_rel=0.14549748599529266, max_rel=403.0989685058594, norm_rel=0.022608162835240364, ref_abs_avg=17.621633529663086, test_abs_avg=17.62240219116211
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.3903028964996338, max_abs=3.5, mean_rel=0.1411161869764328, max_rel=530.2682495117188, norm_rel=0.022053441032767296, ref_abs_avg=17.68488121032715, test_abs_avg=17.686397552490234
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.28407034277915955, max_abs=1.625, mean_rel=0.17343538999557495, max_rel=27.788484573364258, norm_rel=0.020687276497483253, ref_abs_avg=14.100732803344727, test_abs_avg=14.109785079956055
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.38158148527145386, max_abs=3.40625, mean_rel=0.14060652256011963, max_rel=557.1993408203125, norm_rel=0.022077741101384163, ref_abs_avg=17.246339797973633, test_abs_avg=17.246437072753906
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.37580472230911255, max_abs=2.75, mean_rel=0.14504548907279968, max_rel=900.7550048828125, norm_rel=0.022329021245241165, ref_abs_avg=16.84173583984375, test_abs_avg=16.83704376220703
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3066725730895996, max_abs=1.375, mean_rel=0.12027086317539215, max_rel=22.953609466552734, norm_rel=0.021183496341109276, ref_abs_avg=14.514824867248535, test_abs_avg=14.487739562988281
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.36343079805374146, max_abs=2.75, mean_rel=0.13942250609397888, max_rel=458.7262268066406, norm_rel=0.021870417520403862, ref_abs_avg=16.60662841796875, test_abs_avg=16.607160568237305
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.35908621549606323, max_abs=2.5, mean_rel=0.1460137516260147, max_rel=627.4808959960938, norm_rel=0.021702563390135765, ref_abs_avg=16.490877151489258, test_abs_avg=16.484962463378906
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.3327164649963379, max_abs=1.41796875, mean_rel=0.10291002690792084, max_rel=12.063350677490234, norm_rel=0.022358592599630356, ref_abs_avg=14.833686828613281, test_abs_avg=14.806640625
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.4079819917678833, max_abs=3.0, mean_rel=0.15332479774951935, max_rel=761.5818481445312, norm_rel=0.022858120501041412, ref_abs_avg=17.82378578186035, test_abs_avg=17.82268524169922
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.3933984935283661, max_abs=3.125, mean_rel=0.1420639157295227, max_rel=673.6514892578125, norm_rel=0.022611455991864204, ref_abs_avg=17.395557403564453, test_abs_avg=17.396181106567383
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.33734893798828125, max_abs=1.125, mean_rel=0.09039846807718277, max_rel=7.119312286376953, norm_rel=0.02292180433869362, ref_abs_avg=14.455564498901367, test_abs_avg=14.461507797241211
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.38064759969711304, max_abs=2.8125, mean_rel=0.14990825951099396, max_rel=838.1088256835938, norm_rel=0.02251194231212139, ref_abs_avg=16.881980895996094, test_abs_avg=16.881877899169922
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.372055321931839, max_abs=3.375, mean_rel=0.14060547947883606, max_rel=642.1122436523438, norm_rel=0.02226482331752777, ref_abs_avg=16.68224334716797, test_abs_avg=16.681068420410156
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.28350114822387695, max_abs=1.375, mean_rel=0.10291410982608795, max_rel=11.852362632751465, norm_rel=0.02051551453769207, ref_abs_avg=14.060365676879883, test_abs_avg=14.073526382446289
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.3474351465702057, max_abs=2.75, mean_rel=0.13328826427459717, max_rel=389.8624267578125, norm_rel=0.021704109385609627, ref_abs_avg=15.997983932495117, test_abs_avg=15.998565673828125
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.34202438592910767, max_abs=3.0, mean_rel=0.1571049988269806, max_rel=880.8492431640625, norm_rel=0.02181863598525524, ref_abs_avg=15.668106079101562, test_abs_avg=15.676664352416992
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.27866268157958984, max_abs=1.0, mean_rel=0.06983821094036102, max_rel=1.8438670635223389, norm_rel=0.022253461182117462, ref_abs_avg=12.285759925842285, test_abs_avg=12.283073425292969
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.3239912986755371, max_abs=3.0, mean_rel=0.12778061628341675, max_rel=355.5726318359375, norm_rel=0.021328076720237732, ref_abs_avg=15.17086410522461, test_abs_avg=15.170541763305664
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.32005825638771057, max_abs=3.75, mean_rel=0.13359801471233368, max_rel=480.21527099609375, norm_rel=0.02118464931845665, ref_abs_avg=15.097986221313477, test_abs_avg=15.094938278198242
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.24931931495666504, max_abs=1.125, mean_rel=0.1442784070968628, max_rel=20.925039291381836, norm_rel=0.021246779710054398, ref_abs_avg=11.788803100585938, test_abs_avg=11.803689956665039
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.2970232367515564, max_abs=2.625, mean_rel=0.12989521026611328, max_rel=701.7288208007812, norm_rel=0.020646532997488976, ref_abs_avg=14.420707702636719, test_abs_avg=14.419722557067871
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.29231852293014526, max_abs=3.0, mean_rel=0.13603422045707703, max_rel=903.2481079101562, norm_rel=0.02023913525044918, ref_abs_avg=14.473323822021484, test_abs_avg=14.474235534667969
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.23485469818115234, max_abs=0.90625, mean_rel=0.05815059691667557, max_rel=1.5820534229278564, norm_rel=0.01934521272778511, ref_abs_avg=12.211177825927734, test_abs_avg=12.204926490783691
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.2920045852661133, max_abs=3.125, mean_rel=0.11589778959751129, max_rel=602.5670166015625, norm_rel=0.02018645405769348, ref_abs_avg=14.53785514831543, test_abs_avg=14.537164688110352
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.28026649355888367, max_abs=3.09375, mean_rel=0.1324577033519745, max_rel=588.4859619140625, norm_rel=0.020366121083498, ref_abs_avg=13.968740463256836, test_abs_avg=13.960004806518555
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.22870373725891113, max_abs=1.0, mean_rel=0.12446033954620361, max_rel=29.793378829956055, norm_rel=0.020438482984900475, ref_abs_avg=11.221468925476074, test_abs_avg=11.23869514465332
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.2659234404563904, max_abs=2.9296875, mean_rel=0.11752226948738098, max_rel=496.8337707519531, norm_rel=0.019467217847704887, ref_abs_avg=13.809428215026855, test_abs_avg=13.808097839355469
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.254693865776062, max_abs=2.5, mean_rel=0.11632586270570755, max_rel=490.3312072753906, norm_rel=0.018452323973178864, ref_abs_avg=13.865254402160645, test_abs_avg=13.86646842956543
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.21685528755187988, max_abs=0.75, mean_rel=0.15277765691280365, max_rel=25.404054641723633, norm_rel=0.01934514380991459, ref_abs_avg=11.103204727172852, test_abs_avg=11.10446548461914
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.2521743178367615, max_abs=3.25, mean_rel=0.11627413332462311, max_rel=730.7393798828125, norm_rel=0.019040416926145554, ref_abs_avg=13.483221054077148, test_abs_avg=13.481914520263672
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.2470407783985138, max_abs=3.75, mean_rel=0.11600497364997864, max_rel=317.51519775390625, norm_rel=0.019129274412989616, ref_abs_avg=13.206521987915039, test_abs_avg=13.21316146850586
liger_forward vs paper_forward output: mean_abs=0.00015508531942032278, max_abs=0.03515625
liger_forward grad[0] vs paper_forward: mean_abs=0.003627280006185174, max_abs=0.234375, mean_rel=0.025658633559942245, max_rel=55.75831985473633, norm_rel=0.009628545492887497, ref_abs_avg=0.47123289108276367, test_abs_avg=0.47120845317840576
liger_forward grad[1] vs paper_forward: mean_abs=1.6225495338439941, max_abs=16.0, mean_rel=0.05227893590927124, max_rel=172.66677856445312, norm_rel=0.006714350543916225, ref_abs_avg=231.5425262451172, test_abs_avg=231.55885314941406
liger_forward grad[2] vs paper_forward: mean_abs=0.3459010124206543, max_abs=1.5, mean_rel=0.06099071353673935, max_rel=8.173903465270996, norm_rel=0.009051533415913582, ref_abs_avg=39.29471969604492, test_abs_avg=39.323726654052734
liger_forward grad[3] vs paper_forward: mean_abs=0.409906268119812, max_abs=2.75, mean_rel=0.05790012329816818, max_rel=688.5398559570312, norm_rel=0.008870304562151432, ref_abs_avg=47.883628845214844, test_abs_avg=47.883522033691406
liger_forward grad[4] vs paper_forward: mean_abs=0.3982657194137573, max_abs=3.0, mean_rel=0.058521632105112076, max_rel=615.4514770507812, norm_rel=0.008703020401299, ref_abs_avg=47.53989028930664, test_abs_avg=47.538414001464844
liger_forward grad[5] vs paper_forward: mean_abs=0.3059368133544922, max_abs=1.375, mean_rel=0.049633949995040894, max_rel=3.3874704837799072, norm_rel=0.009072625078260899, ref_abs_avg=34.11534118652344, test_abs_avg=34.1102180480957
liger_forward grad[6] vs paper_forward: mean_abs=0.3503882884979248, max_abs=2.5, mean_rel=0.05398984253406525, max_rel=542.1060791015625, norm_rel=0.008596709929406643, ref_abs_avg=42.341819763183594, test_abs_avg=42.34185028076172
liger_forward grad[7] vs paper_forward: mean_abs=0.335603266954422, max_abs=2.25, mean_rel=0.04930635541677475, max_rel=241.92483520507812, norm_rel=0.008337079547345638, ref_abs_avg=41.967689514160156, test_abs_avg=41.9678955078125
liger_forward grad[8] vs paper_forward: mean_abs=0.2593071460723877, max_abs=1.046875, mean_rel=0.06798768043518066, max_rel=20.43097496032715, norm_rel=0.008060859516263008, ref_abs_avg=32.527496337890625, test_abs_avg=32.523990631103516
liger_forward grad[9] vs paper_forward: mean_abs=0.3156195282936096, max_abs=2.0, mean_rel=0.0552322119474411, max_rel=338.7806091308594, norm_rel=0.008412565104663372, ref_abs_avg=39.03321838378906, test_abs_avg=39.032958984375
liger_forward grad[10] vs paper_forward: mean_abs=0.3037737309932709, max_abs=2.0, mean_rel=0.05631985887885094, max_rel=403.6575927734375, norm_rel=0.008246872574090958, ref_abs_avg=38.433815002441406, test_abs_avg=38.435272216796875
liger_forward grad[11] vs paper_forward: mean_abs=0.22233819961547852, max_abs=0.875, mean_rel=0.0288422591984272, max_rel=2.135348081588745, norm_rel=0.007764913607388735, ref_abs_avg=30.580049514770508, test_abs_avg=30.59225845336914
liger_forward grad[12] vs paper_forward: mean_abs=0.2879713773727417, max_abs=2.0, mean_rel=0.05563175678253174, max_rel=473.90032958984375, norm_rel=0.008289765566587448, ref_abs_avg=36.194068908691406, test_abs_avg=36.19353485107422
liger_forward grad[13] vs paper_forward: mean_abs=0.27873069047927856, max_abs=2.0, mean_rel=0.053314827382564545, max_rel=270.9910888671875, norm_rel=0.00809655711054802, ref_abs_avg=35.99748992919922, test_abs_avg=35.99612045288086
liger_forward grad[14] vs paper_forward: mean_abs=0.23375892639160156, max_abs=1.0, mean_rel=0.03209473192691803, max_rel=2.343245029449463, norm_rel=0.008882807567715645, ref_abs_avg=27.546123504638672, test_abs_avg=27.543123245239258
liger_forward grad[15] vs paper_forward: mean_abs=0.26540762186050415, max_abs=2.0, mean_rel=0.05300387740135193, max_rel=391.5345153808594, norm_rel=0.008114303462207317, ref_abs_avg=34.13508224487305, test_abs_avg=34.13484573364258
liger_forward grad[16] vs paper_forward: mean_abs=0.25817248225212097, max_abs=2.0, mean_rel=0.06149603798985481, max_rel=530.6881103515625, norm_rel=0.008005387149751186, ref_abs_avg=33.70772171020508, test_abs_avg=33.70863342285156
liger_forward grad[17] vs paper_forward: mean_abs=0.21002376079559326, max_abs=0.75, mean_rel=0.03159550949931145, max_rel=3.403679847717285, norm_rel=0.008404928259551525, ref_abs_avg=25.627838134765625, test_abs_avg=25.626476287841797
liger_forward grad[18] vs paper_forward: mean_abs=0.2483862340450287, max_abs=1.75, mean_rel=0.04843442887067795, max_rel=330.36334228515625, norm_rel=0.008007492870092392, ref_abs_avg=32.40326690673828, test_abs_avg=32.40325164794922
liger_forward grad[19] vs paper_forward: mean_abs=0.23851455748081207, max_abs=1.5, mean_rel=0.05148940533399582, max_rel=218.17250061035156, norm_rel=0.00788299273699522, ref_abs_avg=31.68439483642578, test_abs_avg=31.684062957763672
liger_forward grad[20] vs paper_forward: mean_abs=0.19751548767089844, max_abs=0.75, mean_rel=0.023274267092347145, max_rel=1.8843958377838135, norm_rel=0.008399004116654396, ref_abs_avg=25.144874572753906, test_abs_avg=25.131357192993164
liger_forward grad[21] vs paper_forward: mean_abs=0.22956207394599915, max_abs=1.59375, mean_rel=0.051732562482357025, max_rel=330.1163330078125, norm_rel=0.007905532605946064, ref_abs_avg=30.394535064697266, test_abs_avg=30.394790649414062
liger_forward grad[22] vs paper_forward: mean_abs=0.2201870083808899, max_abs=1.5, mean_rel=0.052713990211486816, max_rel=259.9601135253906, norm_rel=0.007698922883719206, ref_abs_avg=30.046524047851562, test_abs_avg=30.045820236206055
liger_forward grad[23] vs paper_forward: mean_abs=0.1874537467956543, max_abs=0.75, mean_rel=0.040450841188430786, max_rel=3.31194806098938, norm_rel=0.008307618089020252, ref_abs_avg=23.58572006225586, test_abs_avg=23.588939666748047
liger_forward grad[24] vs paper_forward: mean_abs=0.2166372388601303, max_abs=1.5, mean_rel=0.05104422569274902, max_rel=355.0661315917969, norm_rel=0.007771220989525318, ref_abs_avg=29.22460174560547, test_abs_avg=29.224966049194336
liger_forward grad[25] vs paper_forward: mean_abs=0.2082585096359253, max_abs=1.5, mean_rel=0.053723569959402084, max_rel=433.9256286621094, norm_rel=0.007551922928541899, ref_abs_avg=28.95445442199707, test_abs_avg=28.952348709106445
liger_forward grad[26] vs paper_forward: mean_abs=0.19396400451660156, max_abs=0.8125, mean_rel=0.036637578159570694, max_rel=4.211387634277344, norm_rel=0.007969450205564499, ref_abs_avg=24.804443359375, test_abs_avg=24.807594299316406
liger_forward grad[27] vs paper_forward: mean_abs=0.2371368110179901, max_abs=1.75, mean_rel=0.04863005131483078, max_rel=438.8909606933594, norm_rel=0.007986464537680149, ref_abs_avg=31.06580924987793, test_abs_avg=31.065841674804688
liger_forward grad[28] vs paper_forward: mean_abs=0.2271067053079605, max_abs=1.75, mean_rel=0.05076799914240837, max_rel=295.880859375, norm_rel=0.007773202378302813, ref_abs_avg=30.675973892211914, test_abs_avg=30.676654815673828
liger_forward grad[29] vs paper_forward: mean_abs=0.17951595783233643, max_abs=0.625, mean_rel=0.025991948321461678, max_rel=1.307978630065918, norm_rel=0.008217751048505306, ref_abs_avg=22.298418045043945, test_abs_avg=22.304845809936523
liger_forward grad[30] vs paper_forward: mean_abs=0.21008354425430298, max_abs=1.625, mean_rel=0.05086565762758255, max_rel=362.9698791503906, norm_rel=0.007794112432748079, ref_abs_avg=28.24618911743164, test_abs_avg=28.245586395263672
liger_forward grad[31] vs paper_forward: mean_abs=0.20193445682525635, max_abs=1.5, mean_rel=0.04343302920460701, max_rel=87.9046630859375, norm_rel=0.007605277933180332, ref_abs_avg=27.879772186279297, test_abs_avg=27.881153106689453
liger_forward grad[32] vs paper_forward: mean_abs=0.15502886474132538, max_abs=0.66015625, mean_rel=0.1979372799396515, max_rel=90.00225830078125, norm_rel=0.007925622165203094, ref_abs_avg=20.547639846801758, test_abs_avg=20.544126510620117
liger_forward grad[33] vs paper_forward: mean_abs=0.19274196028709412, max_abs=1.5, mean_rel=0.04650482162833214, max_rel=292.89373779296875, norm_rel=0.007608928717672825, ref_abs_avg=26.577991485595703, test_abs_avg=26.577167510986328
liger_forward grad[34] vs paper_forward: mean_abs=0.18509052693843842, max_abs=1.25, mean_rel=0.04638504981994629, max_rel=240.87820434570312, norm_rel=0.007353180553764105, ref_abs_avg=26.47996711730957, test_abs_avg=26.47882843017578
liger_forward grad[35] vs paper_forward: mean_abs=0.15196466445922852, max_abs=0.796875, mean_rel=0.05206930264830589, max_rel=7.043734073638916, norm_rel=0.007668021135032177, ref_abs_avg=20.743974685668945, test_abs_avg=20.752422332763672
liger_forward grad[36] vs paper_forward: mean_abs=0.17996874451637268, max_abs=1.25, mean_rel=0.05058452486991882, max_rel=297.575927734375, norm_rel=0.007502774242311716, ref_abs_avg=25.19529151916504, test_abs_avg=25.19516944885254
liger_forward grad[37] vs paper_forward: mean_abs=0.17399735748767853, max_abs=1.0, mean_rel=0.0455087274312973, max_rel=220.88185119628906, norm_rel=0.007417787332087755, ref_abs_avg=24.69623374938965, test_abs_avg=24.6970272064209
liger_forward grad[38] vs paper_forward: mean_abs=0.13602590560913086, max_abs=0.5, mean_rel=0.028115810826420784, max_rel=1.5626747608184814, norm_rel=0.006961164530366659, ref_abs_avg=20.605052947998047, test_abs_avg=20.586986541748047
liger_forward grad[39] vs paper_forward: mean_abs=0.1682567596435547, max_abs=1.25, mean_rel=0.04419064149260521, max_rel=217.7806396484375, norm_rel=0.007334663067013025, ref_abs_avg=24.128231048583984, test_abs_avg=24.127307891845703
liger_forward grad[40] vs paper_forward: mean_abs=0.16182324290275574, max_abs=1.0, mean_rel=0.04455358535051346, max_rel=151.5266571044922, norm_rel=0.007207457907497883, ref_abs_avg=23.667037963867188, test_abs_avg=23.666580200195312
liger_forward grad[41] vs paper_forward: mean_abs=0.13055658340454102, max_abs=0.5, mean_rel=0.039676859974861145, max_rel=4.8266801834106445, norm_rel=0.007343003526329994, ref_abs_avg=18.750926971435547, test_abs_avg=18.758636474609375
liger_forward grad[42] vs paper_forward: mean_abs=0.1582314372062683, max_abs=1.0, mean_rel=0.04387893155217171, max_rel=207.18630981445312, norm_rel=0.007216735742986202, ref_abs_avg=23.082015991210938, test_abs_avg=23.08159637451172
liger_forward grad[43] vs paper_forward: mean_abs=0.15207447111606598, max_abs=1.0, mean_rel=0.044789932668209076, max_rel=184.00181579589844, norm_rel=0.007066030520945787, ref_abs_avg=22.717052459716797, test_abs_avg=22.717693328857422
liger_forward grad[44] vs paper_forward: mean_abs=0.12734127044677734, max_abs=0.5, mean_rel=0.029301676899194717, max_rel=1.3495506048202515, norm_rel=0.007222939282655716, ref_abs_avg=18.331256866455078, test_abs_avg=18.322368621826172
liger_forward grad[45] vs paper_forward: mean_abs=0.15024003386497498, max_abs=1.125, mean_rel=0.04469333216547966, max_rel=187.32943725585938, norm_rel=0.007069645915180445, ref_abs_avg=22.41339111328125, test_abs_avg=22.412921905517578
liger_forward grad[46] vs paper_forward: mean_abs=0.14546167850494385, max_abs=1.0, mean_rel=0.04137168079614639, max_rel=205.1707000732422, norm_rel=0.0070280032232403755, ref_abs_avg=21.86574935913086, test_abs_avg=21.86486053466797
liger_forward grad[47] vs paper_forward: mean_abs=0.11259662359952927, max_abs=0.5, mean_rel=0.2518764138221741, max_rel=114.64845275878906, norm_rel=0.0067724441178143024, ref_abs_avg=17.220752716064453, test_abs_avg=17.21688461303711
liger_forward grad[48] vs paper_forward: mean_abs=0.14208677411079407, max_abs=1.0, mean_rel=0.04411948472261429, max_rel=183.8674774169922, norm_rel=0.007043400313705206, ref_abs_avg=21.26679229736328, test_abs_avg=21.266414642333984
liger_forward grad[49] vs paper_forward: mean_abs=0.13693282008171082, max_abs=1.0, mean_rel=0.04290952533483505, max_rel=203.5479736328125, norm_rel=0.006859813816845417, ref_abs_avg=21.175899505615234, test_abs_avg=21.175701141357422
liger_forward grad[50] vs paper_forward: mean_abs=0.13682222366333008, max_abs=0.625, mean_rel=0.04281221330165863, max_rel=2.6163532733917236, norm_rel=0.007837959565222263, ref_abs_avg=18.568939208984375, test_abs_avg=18.551612854003906
liger_forward grad[51] vs paper_forward: mean_abs=0.16360142827033997, max_abs=1.125, mean_rel=0.046462222933769226, max_rel=399.034423828125, norm_rel=0.007543065585196018, ref_abs_avg=22.781658172607422, test_abs_avg=22.780982971191406
liger_forward grad[52] vs paper_forward: mean_abs=0.1576465368270874, max_abs=1.125, mean_rel=0.042161453515291214, max_rel=139.25148010253906, norm_rel=0.007404835429042578, ref_abs_avg=22.42728042602539, test_abs_avg=22.427734375
liger_forward grad[53] vs paper_forward: mean_abs=0.12795305252075195, max_abs=0.5, mean_rel=0.06249411404132843, max_rel=11.844407081604004, norm_rel=0.007714496925473213, ref_abs_avg=16.968997955322266, test_abs_avg=16.964975357055664
liger_forward grad[54] vs paper_forward: mean_abs=0.14970532059669495, max_abs=1.3125, mean_rel=0.04635322093963623, max_rel=226.7201385498047, norm_rel=0.0073961070738732815, ref_abs_avg=21.264896392822266, test_abs_avg=21.26421356201172
liger_forward grad[55] vs paper_forward: mean_abs=0.14351752400398254, max_abs=1.0, mean_rel=0.045005083084106445, max_rel=484.4837646484375, norm_rel=0.007216337136924267, ref_abs_avg=20.952611923217773, test_abs_avg=20.952045440673828
liger_forward grad[56] vs paper_forward: mean_abs=0.10732074081897736, max_abs=0.5625, mean_rel=0.049506958574056625, max_rel=11.90824031829834, norm_rel=0.007062938064336777, ref_abs_avg=16.623029708862305, test_abs_avg=16.629924774169922
liger_forward grad[57] vs paper_forward: mean_abs=0.13698624074459076, max_abs=1.0, mean_rel=0.04534514993429184, max_rel=366.45513916015625, norm_rel=0.007165994960814714, ref_abs_avg=20.099353790283203, test_abs_avg=20.099287033081055
liger_forward grad[58] vs paper_forward: mean_abs=0.13214030861854553, max_abs=1.0, mean_rel=0.04713946208357811, max_rel=225.08700561523438, norm_rel=0.0069718919694423676, ref_abs_avg=20.01338768005371, test_abs_avg=20.012264251708984
liger_forward grad[59] vs paper_forward: mean_abs=0.10881996154785156, max_abs=0.5, mean_rel=0.020125355571508408, max_rel=1.050849199295044, norm_rel=0.006963404361158609, ref_abs_avg=16.565650939941406, test_abs_avg=16.560028076171875
liger_forward grad[60] vs paper_forward: mean_abs=0.127603217959404, max_abs=1.0, mean_rel=0.043431445956230164, max_rel=224.11691284179688, norm_rel=0.007016027811914682, ref_abs_avg=19.161151885986328, test_abs_avg=19.161476135253906
liger_forward grad[61] vs paper_forward: mean_abs=0.12390784919261932, max_abs=1.0, mean_rel=0.0430806539952755, max_rel=279.9846496582031, norm_rel=0.006844589486718178, ref_abs_avg=19.140321731567383, test_abs_avg=19.139263153076172
liger_forward grad[62] vs paper_forward: mean_abs=0.11016279458999634, max_abs=0.4375, mean_rel=0.07913874834775925, max_rel=21.43962287902832, norm_rel=0.0075363521464169025, ref_abs_avg=15.238761901855469, test_abs_avg=15.240667343139648
liger_forward grad[63] vs paper_forward: mean_abs=0.1207590401172638, max_abs=1.0, mean_rel=0.04480290412902832, max_rel=193.3428192138672, norm_rel=0.006900099106132984, ref_abs_avg=18.459152221679688, test_abs_avg=18.45935821533203
liger_forward grad[64] vs paper_forward: mean_abs=0.11540496349334717, max_abs=1.0, mean_rel=0.039938557893037796, max_rel=244.9813995361328, norm_rel=0.006695855408906937, ref_abs_avg=18.289257049560547, test_abs_avg=18.289714813232422
liger_forward grad[65] vs paper_forward: mean_abs=0.092559814453125, max_abs=0.375, mean_rel=0.027073657140135765, max_rel=1.7751086950302124, norm_rel=0.006451899651437998, ref_abs_avg=15.10603141784668, test_abs_avg=15.095990180969238
liger_forward grad[66] vs paper_forward: mean_abs=0.11238233745098114, max_abs=1.0, mean_rel=0.04160100221633911, max_rel=217.47142028808594, norm_rel=0.006752549204975367, ref_abs_avg=17.621633529663086, test_abs_avg=17.62152099609375
liger_forward grad[67] vs paper_forward: mean_abs=0.10875372588634491, max_abs=1.0, mean_rel=0.03863418102264404, max_rel=211.1020050048828, norm_rel=0.006532812956720591, ref_abs_avg=17.68488121032715, test_abs_avg=17.685306549072266
liger_forward grad[68] vs paper_forward: mean_abs=0.0911475419998169, max_abs=0.40625, mean_rel=0.04129430279135704, max_rel=7.056883335113525, norm_rel=0.006819468457251787, ref_abs_avg=14.100732803344727, test_abs_avg=14.097436904907227
liger_forward grad[69] vs paper_forward: mean_abs=0.10697951912879944, max_abs=1.0, mean_rel=0.03866150230169296, max_rel=198.34559631347656, norm_rel=0.006571505218744278, ref_abs_avg=17.246339797973633, test_abs_avg=17.246196746826172
liger_forward grad[70] vs paper_forward: mean_abs=0.10358898341655731, max_abs=1.0, mean_rel=0.0402817577123642, max_rel=217.80380249023438, norm_rel=0.006540250964462757, ref_abs_avg=16.84173583984375, test_abs_avg=16.8413143157959
liger_forward grad[71] vs paper_forward: mean_abs=0.08249878883361816, max_abs=0.3046875, mean_rel=0.026803631335496902, max_rel=2.5107152462005615, norm_rel=0.006187567487359047, ref_abs_avg=14.514824867248535, test_abs_avg=14.51198959350586
liger_forward grad[72] vs paper_forward: mean_abs=0.10164090991020203, max_abs=1.0, mean_rel=0.03999940678477287, max_rel=250.1974639892578, norm_rel=0.006503785960376263, ref_abs_avg=16.60662841796875, test_abs_avg=16.606529235839844
liger_forward grad[73] vs paper_forward: mean_abs=0.09906952828168869, max_abs=1.0, mean_rel=0.037898384034633636, max_rel=97.10472869873047, norm_rel=0.0063980016857385635, ref_abs_avg=16.490877151489258, test_abs_avg=16.49171257019043
liger_forward grad[74] vs paper_forward: mean_abs=0.09184181690216064, max_abs=0.375, mean_rel=0.031718626618385315, max_rel=3.663410186767578, norm_rel=0.006589749827980995, ref_abs_avg=14.833686828613281, test_abs_avg=14.832547187805176
liger_forward grad[75] vs paper_forward: mean_abs=0.11805150657892227, max_abs=1.0, mean_rel=0.04422347992658615, max_rel=167.586669921875, norm_rel=0.006980029866099358, ref_abs_avg=17.82378578186035, test_abs_avg=17.823902130126953
liger_forward grad[76] vs paper_forward: mean_abs=0.11438187956809998, max_abs=1.0, mean_rel=0.04231535643339157, max_rel=196.47537231445312, norm_rel=0.00696125952526927, ref_abs_avg=17.395557403564453, test_abs_avg=17.397457122802734
liger_forward grad[77] vs paper_forward: mean_abs=0.08840727806091309, max_abs=0.5078125, mean_rel=0.024228587746620178, max_rel=2.289797782897949, norm_rel=0.006617027334868908, ref_abs_avg=14.455564498901367, test_abs_avg=14.460967063903809
liger_forward grad[78] vs paper_forward: mean_abs=0.11042258888483047, max_abs=1.0, mean_rel=0.04322509467601776, max_rel=271.1703796386719, norm_rel=0.006909042596817017, ref_abs_avg=16.881980895996094, test_abs_avg=16.88193702697754
liger_forward grad[79] vs paper_forward: mean_abs=0.10470731556415558, max_abs=1.0, mean_rel=0.03996595740318298, max_rel=187.69541931152344, norm_rel=0.006651781965047121, ref_abs_avg=16.68224334716797, test_abs_avg=16.682876586914062
liger_forward grad[80] vs paper_forward: mean_abs=0.08859029412269592, max_abs=0.375, mean_rel=0.0305604487657547, max_rel=1.8939393758773804, norm_rel=0.006725792307406664, ref_abs_avg=14.060365676879883, test_abs_avg=14.056032180786133
liger_forward grad[81] vs paper_forward: mean_abs=0.10035640746355057, max_abs=1.0, mean_rel=0.04021957516670227, max_rel=300.78125, norm_rel=0.006649276707321405, ref_abs_avg=15.997983932495117, test_abs_avg=15.998037338256836
liger_forward grad[82] vs paper_forward: mean_abs=0.09795589745044708, max_abs=1.0, mean_rel=0.04225321114063263, max_rel=447.3649597167969, norm_rel=0.006637010723352432, ref_abs_avg=15.668106079101562, test_abs_avg=15.67010498046875
liger_forward grad[83] vs paper_forward: mean_abs=0.07901763916015625, max_abs=0.375, mean_rel=0.0225813128054142, max_rel=1.351309061050415, norm_rel=0.006726058665663004, ref_abs_avg=12.285759925842285, test_abs_avg=12.273114204406738
liger_forward grad[84] vs paper_forward: mean_abs=0.09345810115337372, max_abs=1.0, mean_rel=0.037082329392433167, max_rel=164.89431762695312, norm_rel=0.006551829166710377, ref_abs_avg=15.17086410522461, test_abs_avg=15.171146392822266
liger_forward grad[85] vs paper_forward: mean_abs=0.0897253155708313, max_abs=1.0, mean_rel=0.03817179426550865, max_rel=167.96620178222656, norm_rel=0.006353350821882486, ref_abs_avg=15.097986221313477, test_abs_avg=15.099825859069824
liger_forward grad[86] vs paper_forward: mean_abs=0.07519221305847168, max_abs=0.3125, mean_rel=0.020872771739959717, max_rel=0.6694254875183105, norm_rel=0.0066184126771986485, ref_abs_avg=11.788803100585938, test_abs_avg=11.791136741638184
liger_forward grad[87] vs paper_forward: mean_abs=0.0853688195347786, max_abs=1.0, mean_rel=0.03666267544031143, max_rel=215.4940185546875, norm_rel=0.006338296923786402, ref_abs_avg=14.420707702636719, test_abs_avg=14.420629501342773
liger_forward grad[88] vs paper_forward: mean_abs=0.08439677953720093, max_abs=1.0, mean_rel=0.03725709393620491, max_rel=110.70258331298828, norm_rel=0.006265060044825077, ref_abs_avg=14.473323822021484, test_abs_avg=14.472878456115723
liger_forward grad[89] vs paper_forward: mean_abs=0.06547927856445312, max_abs=0.296875, mean_rel=0.01594659686088562, max_rel=0.4443970322608948, norm_rel=0.005995302926748991, ref_abs_avg=12.211177825927734, test_abs_avg=12.21051025390625
liger_forward grad[90] vs paper_forward: mean_abs=0.08333086967468262, max_abs=1.0, mean_rel=0.0332404226064682, max_rel=142.8323974609375, norm_rel=0.006172192748636007, ref_abs_avg=14.53785514831543, test_abs_avg=14.537633895874023
liger_forward grad[91] vs paper_forward: mean_abs=0.0800788402557373, max_abs=1.0, mean_rel=0.037683241069316864, max_rel=179.94268798828125, norm_rel=0.006231960840523243, ref_abs_avg=13.968740463256836, test_abs_avg=13.969440460205078
liger_forward grad[92] vs paper_forward: mean_abs=0.06045675277709961, max_abs=0.25, mean_rel=0.0342567078769207, max_rel=6.415598392486572, norm_rel=0.005997878964990377, ref_abs_avg=11.221468925476074, test_abs_avg=11.226028442382812
liger_forward grad[93] vs paper_forward: mean_abs=0.07614657282829285, max_abs=1.0, mean_rel=0.03366593271493912, max_rel=117.94757080078125, norm_rel=0.0060052950866520405, ref_abs_avg=13.809428215026855, test_abs_avg=13.809568405151367
liger_forward grad[94] vs paper_forward: mean_abs=0.07265417277812958, max_abs=1.0, mean_rel=0.031690601259469986, max_rel=151.06326293945312, norm_rel=0.0057240016758441925, ref_abs_avg=13.865254402160645, test_abs_avg=13.864402770996094
liger_forward grad[95] vs paper_forward: mean_abs=0.06207549571990967, max_abs=0.3125, mean_rel=0.04504218325018883, max_rel=9.442604064941406, norm_rel=0.006054278928786516, ref_abs_avg=11.103204727172852, test_abs_avg=11.104766845703125
liger_forward grad[96] vs paper_forward: mean_abs=0.07130291312932968, max_abs=1.0, mean_rel=0.03255563601851463, max_rel=110.30779266357422, norm_rel=0.0058275130577385426, ref_abs_avg=13.483221054077148, test_abs_avg=13.48299789428711
liger_forward grad[97] vs paper_forward: mean_abs=0.06863397359848022, max_abs=1.0, mean_rel=0.03174106776714325, max_rel=98.4455337524414, norm_rel=0.005769859999418259, ref_abs_avg=13.206521987915039, test_abs_avg=13.207879066467285

