identity layers + randn queries
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.67s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_out_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16'),
finished after 4.89s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.75s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.86s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.87s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.60s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 13.07s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 1, 'torch.float32', 'torch.float32'),
finished after 1.79s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 6.28s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_2_reduce_grad_pseudo_query_kernel,
with key as (131072, 512, 'torch.float32', 'torch.float32'),
finished after 1.74s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 47.11s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 8, 'torch.float32', 'torch.float32'),
finished after 1.97s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 42.35s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 31.95s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 19.50s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
production_forward2 fwd+bwd:  191.611 ms
production_forward2 bwd-only: 172.504 ms
production_forward2 peak allocated: fwd=2.567 GiB, fwd+bwd=5.946 GiB
production_forward2 peak reserved:  fwd=2.930 GiB, fwd+bwd=8.680 GiB
torch_compile_phases_forward fwd+bwd:  165.823 ms
torch_compile_phases_forward bwd-only: 132.462 ms
torch_compile_phases_forward peak allocated: fwd=12.781 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.078 GiB, fwd+bwd=17.330 GiB
paper_forward fwd+bwd:  384.077 ms
paper_forward bwd-only: 303.774 ms
paper_forward peak allocated: fwd=29.706 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.723 GiB, fwd+bwd=32.473 GiB
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 6.69s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 51.37s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 43.43s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 32.53s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 20.68s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None;
production_forward fwd+bwd:  113.511 ms
production_forward bwd-only: 95.946 ms
production_forward peak allocated: fwd=3.071 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=3.305 GiB, fwd+bwd=11.305 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016596447676420212, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008391141891479492, max_abs=0.3828125, mean_rel=0.07271900773048401, max_rel=130.4047088623047, norm_rel=0.019886896014213562, ref_abs_avg=0.4576581120491028, test_abs_avg=0.4576735496520996
production_forward grad[1] vs paper_forward: mean_abs=7.363656044006348, max_abs=56.0, mean_rel=0.1609993278980255, max_rel=621.06787109375, norm_rel=0.020853379741311073, ref_abs_avg=316.9703369140625, test_abs_avg=316.98919677734375
production_forward grad[2] vs paper_forward: mean_abs=1.1530885696411133, max_abs=4.5, mean_rel=0.1513150930404663, max_rel=30.532974243164062, norm_rel=0.02178768813610077, ref_abs_avg=54.396995544433594, test_abs_avg=54.33839797973633
production_forward grad[3] vs paper_forward: mean_abs=1.5700602531433105, max_abs=10.0, mean_rel=0.16661658883094788, max_rel=1392.24853515625, norm_rel=0.024591993540525436, ref_abs_avg=64.22499084472656, test_abs_avg=64.2299575805664
production_forward grad[4] vs paper_forward: mean_abs=1.4473540782928467, max_abs=10.0, mean_rel=0.5418697595596313, max_rel=5312.49951171875, norm_rel=0.022866183891892433, ref_abs_avg=63.645503997802734, test_abs_avg=63.652976989746094
production_forward grad[5] vs paper_forward: mean_abs=1.0135469436645508, max_abs=4.0, mean_rel=0.13630852103233337, max_rel=16.226064682006836, norm_rel=0.022393343970179558, ref_abs_avg=46.02235412597656, test_abs_avg=46.06592559814453
production_forward grad[6] vs paper_forward: mean_abs=1.3911974430084229, max_abs=9.0, mean_rel=0.163150355219841, max_rel=1220.84326171875, norm_rel=0.02444080263376236, ref_abs_avg=57.30668640136719, test_abs_avg=57.31420135498047
production_forward grad[7] vs paper_forward: mean_abs=1.2840189933776855, max_abs=8.0, mean_rel=0.3378220498561859, max_rel=3062.499755859375, norm_rel=0.0231090746819973, ref_abs_avg=55.828392028808594, test_abs_avg=55.831573486328125
production_forward grad[8] vs paper_forward: mean_abs=0.9479670524597168, max_abs=3.53125, mean_rel=0.09920472651720047, max_rel=12.940226554870605, norm_rel=0.02262883633375168, ref_abs_avg=42.657676696777344, test_abs_avg=42.607505798339844
production_forward grad[9] vs paper_forward: mean_abs=1.2668312788009644, max_abs=8.0, mean_rel=0.1673065423965454, max_rel=1397.424072265625, norm_rel=0.024300921708345413, ref_abs_avg=52.498939514160156, test_abs_avg=52.502132415771484
production_forward grad[10] vs paper_forward: mean_abs=1.1662487983703613, max_abs=8.0, mean_rel=0.3336919844150543, max_rel=3499.999755859375, norm_rel=0.022569986060261726, ref_abs_avg=51.901329040527344, test_abs_avg=51.90107345581055
production_forward grad[11] vs paper_forward: mean_abs=0.9442024230957031, max_abs=4.25, mean_rel=0.09832082688808441, max_rel=5.760941505432129, norm_rel=0.023947063833475113, ref_abs_avg=39.38050079345703, test_abs_avg=39.373817443847656
production_forward grad[12] vs paper_forward: mean_abs=1.1702669858932495, max_abs=8.0, mean_rel=0.16356845200061798, max_rel=2214.775390625, norm_rel=0.02405931055545807, ref_abs_avg=48.95586395263672, test_abs_avg=48.95472717285156
production_forward grad[13] vs paper_forward: mean_abs=1.0820707082748413, max_abs=6.75, mean_rel=0.35474565625190735, max_rel=2999.999755859375, norm_rel=0.022456496953964233, ref_abs_avg=48.48320007324219, test_abs_avg=48.48279571533203
production_forward grad[14] vs paper_forward: mean_abs=0.8802809715270996, max_abs=3.5, mean_rel=0.08549022674560547, max_rel=8.747641563415527, norm_rel=0.02367340214550495, ref_abs_avg=36.98554229736328, test_abs_avg=36.917091369628906
production_forward grad[15] vs paper_forward: mean_abs=1.097597360610962, max_abs=8.0, mean_rel=0.1576981544494629, max_rel=835.5299682617188, norm_rel=0.02394394390285015, ref_abs_avg=46.10129928588867, test_abs_avg=46.10132598876953
production_forward grad[16] vs paper_forward: mean_abs=1.0119247436523438, max_abs=5.75, mean_rel=0.3099798560142517, max_rel=2874.999755859375, norm_rel=0.022236395627260208, ref_abs_avg=45.71990203857422, test_abs_avg=45.727325439453125
production_forward grad[17] vs paper_forward: mean_abs=0.8315391540527344, max_abs=4.125, mean_rel=0.11436822265386581, max_rel=10.065361022949219, norm_rel=0.024554571136832237, ref_abs_avg=33.75377655029297, test_abs_avg=33.61229705810547
production_forward grad[18] vs paper_forward: mean_abs=1.041542410850525, max_abs=7.0, mean_rel=0.1541777104139328, max_rel=1378.86767578125, norm_rel=0.023812882602214813, ref_abs_avg=44.00061798095703, test_abs_avg=44.00141906738281
production_forward grad[19] vs paper_forward: mean_abs=0.9563435912132263, max_abs=5.75, mean_rel=0.2987505793571472, max_rel=3187.499755859375, norm_rel=0.02205779403448105, ref_abs_avg=43.446956634521484, test_abs_avg=43.4438591003418
production_forward grad[20] vs paper_forward: mean_abs=0.7907636761665344, max_abs=3.0, mean_rel=0.17524197697639465, max_rel=26.24574851989746, norm_rel=0.024088378995656967, ref_abs_avg=32.20331573486328, test_abs_avg=32.25804901123047
production_forward grad[21] vs paper_forward: mean_abs=0.9809786677360535, max_abs=6.0, mean_rel=0.16664892435073853, max_rel=2754.724365234375, norm_rel=0.023742608726024628, ref_abs_avg=41.55921173095703, test_abs_avg=41.56063461303711
production_forward grad[22] vs paper_forward: mean_abs=0.9017118215560913, max_abs=5.125, mean_rel=0.3042955994606018, max_rel=2343.75, norm_rel=0.02201882191002369, ref_abs_avg=41.16310119628906, test_abs_avg=41.17059326171875
production_forward grad[23] vs paper_forward: mean_abs=0.7161884307861328, max_abs=2.3125, mean_rel=0.11249388009309769, max_rel=10.250024795532227, norm_rel=0.02133816108107567, ref_abs_avg=33.39316940307617, test_abs_avg=33.426231384277344
production_forward grad[24] vs paper_forward: mean_abs=0.9429500102996826, max_abs=7.0, mean_rel=0.15548214316368103, max_rel=1530.2266845703125, norm_rel=0.023486770689487457, ref_abs_avg=40.391456604003906, test_abs_avg=40.39191436767578
production_forward grad[25] vs paper_forward: mean_abs=0.8666014671325684, max_abs=6.0, mean_rel=0.2864258885383606, max_rel=2749.999755859375, norm_rel=0.021974749863147736, ref_abs_avg=39.626956939697266, test_abs_avg=39.629817962646484
production_forward grad[26] vs paper_forward: mean_abs=0.8516031503677368, max_abs=4.1875, mean_rel=0.6211586594581604, max_rel=256.99658203125, norm_rel=0.024149080738425255, ref_abs_avg=35.92981719970703, test_abs_avg=35.98686599731445
production_forward grad[27] vs paper_forward: mean_abs=1.082561731338501, max_abs=8.0, mean_rel=0.18872790038585663, max_rel=1660.1119384765625, norm_rel=0.025457212701439857, ref_abs_avg=42.801849365234375, test_abs_avg=42.80531311035156
production_forward grad[28] vs paper_forward: mean_abs=0.994631290435791, max_abs=6.375, mean_rel=0.3789861798286438, max_rel=3562.499755859375, norm_rel=0.023944605141878128, ref_abs_avg=41.685646057128906, test_abs_avg=41.685630798339844
production_forward grad[29] vs paper_forward: mean_abs=0.8205646276473999, max_abs=2.75, mean_rel=0.15541599690914154, max_rel=17.91990852355957, norm_rel=0.02498750016093254, ref_abs_avg=31.705829620361328, test_abs_avg=31.64922332763672
production_forward grad[30] vs paper_forward: mean_abs=0.9982860088348389, max_abs=7.0, mean_rel=0.17941312491893768, max_rel=1802.234619140625, norm_rel=0.02569209784269333, ref_abs_avg=39.047080993652344, test_abs_avg=39.046791076660156
production_forward grad[31] vs paper_forward: mean_abs=0.9322892427444458, max_abs=6.5, mean_rel=0.351406991481781, max_rel=2624.999755859375, norm_rel=0.0243074931204319, ref_abs_avg=38.54276657104492, test_abs_avg=38.54594421386719
production_forward grad[32] vs paper_forward: mean_abs=0.7087192535400391, max_abs=2.75, mean_rel=0.055214062333106995, max_rel=5.045709609985352, norm_rel=0.022425705567002296, ref_abs_avg=32.443382263183594, test_abs_avg=32.39959716796875
production_forward grad[33] vs paper_forward: mean_abs=0.9405311346054077, max_abs=6.75, mean_rel=0.16895245015621185, max_rel=1454.8846435546875, norm_rel=0.025576045736670494, ref_abs_avg=36.95179748535156, test_abs_avg=36.95292663574219
production_forward grad[34] vs paper_forward: mean_abs=0.8772192001342773, max_abs=5.75, mean_rel=0.27153027057647705, max_rel=2781.249755859375, norm_rel=0.024038780480623245, ref_abs_avg=36.52764129638672, test_abs_avg=36.52537155151367
production_forward grad[35] vs paper_forward: mean_abs=0.6492919921875, max_abs=2.03125, mean_rel=0.07275854051113129, max_rel=4.1529717445373535, norm_rel=0.022571617737412453, ref_abs_avg=28.592866897583008, test_abs_avg=28.567354202270508
production_forward grad[36] vs paper_forward: mean_abs=0.8785191774368286, max_abs=6.5, mean_rel=0.16443592309951782, max_rel=1113.5274658203125, norm_rel=0.025376761332154274, ref_abs_avg=34.76472091674805, test_abs_avg=34.76689910888672
production_forward grad[37] vs paper_forward: mean_abs=0.8260149955749512, max_abs=5.375, mean_rel=0.2673993706703186, max_rel=1812.4998779296875, norm_rel=0.024195656180381775, ref_abs_avg=34.24823760986328, test_abs_avg=34.253211975097656
production_forward grad[38] vs paper_forward: mean_abs=0.6266632080078125, max_abs=2.751953125, mean_rel=0.08643607795238495, max_rel=5.233896732330322, norm_rel=0.022720150649547577, ref_abs_avg=28.486955642700195, test_abs_avg=28.480344772338867
production_forward grad[39] vs paper_forward: mean_abs=0.8327561616897583, max_abs=6.0, mean_rel=0.17433834075927734, max_rel=2322.90771484375, norm_rel=0.025208907201886177, ref_abs_avg=33.15801239013672, test_abs_avg=33.156646728515625
production_forward grad[40] vs paper_forward: mean_abs=0.7766683101654053, max_abs=5.0, mean_rel=0.285342812538147, max_rel=1781.2498779296875, norm_rel=0.023663103580474854, ref_abs_avg=32.909202575683594, test_abs_avg=32.908241271972656
production_forward grad[41] vs paper_forward: mean_abs=0.6011213064193726, max_abs=2.125, mean_rel=0.07398705184459686, max_rel=3.5301196575164795, norm_rel=0.021905045956373215, ref_abs_avg=27.190330505371094, test_abs_avg=27.179731369018555
production_forward grad[42] vs paper_forward: mean_abs=0.7895024418830872, max_abs=5.0, mean_rel=0.16307224333286285, max_rel=1059.2108154296875, norm_rel=0.024893058463931084, ref_abs_avg=31.846338272094727, test_abs_avg=31.846630096435547
production_forward grad[43] vs paper_forward: mean_abs=0.7368767857551575, max_abs=4.625, mean_rel=0.2844521403312683, max_rel=2749.999755859375, norm_rel=0.02357015572488308, ref_abs_avg=31.380191802978516, test_abs_avg=31.376941680908203
production_forward grad[44] vs paper_forward: mean_abs=0.5604970455169678, max_abs=2.125, mean_rel=0.10208606719970703, max_rel=6.451687812805176, norm_rel=0.02179223857820034, ref_abs_avg=25.97134017944336, test_abs_avg=25.987403869628906
production_forward grad[45] vs paper_forward: mean_abs=0.7538338303565979, max_abs=5.5, mean_rel=0.17929071187973022, max_rel=2396.68408203125, norm_rel=0.024754777550697327, ref_abs_avg=30.583213806152344, test_abs_avg=30.584627151489258
production_forward grad[46] vs paper_forward: mean_abs=0.7013819217681885, max_abs=4.125, mean_rel=0.2991097569465637, max_rel=2500.0, norm_rel=0.02327067032456398, ref_abs_avg=30.240280151367188, test_abs_avg=30.24173355102539
production_forward grad[47] vs paper_forward: mean_abs=0.5662448406219482, max_abs=2.125, mean_rel=0.3153879642486572, max_rel=93.98287963867188, norm_rel=0.024229606613516808, ref_abs_avg=23.363550186157227, test_abs_avg=23.436588287353516
production_forward grad[48] vs paper_forward: mean_abs=0.7232312560081482, max_abs=5.0, mean_rel=0.1643742024898529, max_rel=1565.2154541015625, norm_rel=0.024573158472776413, ref_abs_avg=29.506683349609375, test_abs_avg=29.508350372314453
production_forward grad[49] vs paper_forward: mean_abs=0.6771119236946106, max_abs=4.25, mean_rel=0.27699798345565796, max_rel=2281.25, norm_rel=0.023355960845947266, ref_abs_avg=29.091655731201172, test_abs_avg=29.086429595947266
production_forward grad[50] vs paper_forward: mean_abs=0.6428440809249878, max_abs=2.75, mean_rel=0.12432393431663513, max_rel=9.867631912231445, norm_rel=0.02494421787559986, ref_abs_avg=25.945098876953125, test_abs_avg=25.976398468017578
production_forward grad[51] vs paper_forward: mean_abs=0.8093384504318237, max_abs=6.0, mean_rel=0.1718103140592575, max_rel=1178.184814453125, norm_rel=0.025907209143042564, ref_abs_avg=31.374462127685547, test_abs_avg=31.37979507446289
production_forward grad[52] vs paper_forward: mean_abs=0.7526941299438477, max_abs=5.5, mean_rel=0.295559823513031, max_rel=2671.874755859375, norm_rel=0.02453799359500408, ref_abs_avg=30.789520263671875, test_abs_avg=30.794185638427734
production_forward grad[53] vs paper_forward: mean_abs=0.5991127490997314, max_abs=2.25, mean_rel=0.1020531952381134, max_rel=9.415915489196777, norm_rel=0.02537732571363449, ref_abs_avg=23.84389877319336, test_abs_avg=23.844371795654297
production_forward grad[54] vs paper_forward: mean_abs=0.755875825881958, max_abs=7.0, mean_rel=0.15688297152519226, max_rel=872.9669189453125, norm_rel=0.025425484403967857, ref_abs_avg=29.799884796142578, test_abs_avg=29.800739288330078
production_forward grad[55] vs paper_forward: mean_abs=0.7029091119766235, max_abs=5.0, mean_rel=0.26658865809440613, max_rel=2328.125, norm_rel=0.024146607145667076, ref_abs_avg=29.178089141845703, test_abs_avg=29.18107795715332
production_forward grad[56] vs paper_forward: mean_abs=0.5381045341491699, max_abs=2.125, mean_rel=0.15543171763420105, max_rel=34.729583740234375, norm_rel=0.024043668061494827, ref_abs_avg=23.393003463745117, test_abs_avg=23.380809783935547
production_forward grad[57] vs paper_forward: mean_abs=0.6982156038284302, max_abs=5.0, mean_rel=0.15728965401649475, max_rel=691.3233032226562, norm_rel=0.02506275847554207, ref_abs_avg=27.921274185180664, test_abs_avg=27.92405128479004
production_forward grad[58] vs paper_forward: mean_abs=0.6477036476135254, max_abs=4.25, mean_rel=0.2649415135383606, max_rel=2531.25, norm_rel=0.023439936339855194, ref_abs_avg=27.691734313964844, test_abs_avg=27.693225860595703
production_forward grad[59] vs paper_forward: mean_abs=0.5008597373962402, max_abs=1.828125, mean_rel=0.07402300089597702, max_rel=2.4702394008636475, norm_rel=0.022205134853720665, ref_abs_avg=22.526409149169922, test_abs_avg=22.52803611755371
production_forward grad[60] vs paper_forward: mean_abs=0.6577510237693787, max_abs=6.0, mean_rel=0.16541296243667603, max_rel=919.7869873046875, norm_rel=0.02463676407933235, ref_abs_avg=26.72528076171875, test_abs_avg=26.725690841674805
production_forward grad[61] vs paper_forward: mean_abs=0.6035306453704834, max_abs=4.5, mean_rel=0.24854515492916107, max_rel=2093.75, norm_rel=0.023064786568284035, ref_abs_avg=26.21833038330078, test_abs_avg=26.223899841308594
production_forward grad[62] vs paper_forward: mean_abs=0.4883277416229248, max_abs=2.15625, mean_rel=0.21611106395721436, max_rel=34.71159744262695, norm_rel=0.022527417168021202, ref_abs_avg=21.435123443603516, test_abs_avg=21.44489288330078
production_forward grad[63] vs paper_forward: mean_abs=0.6228782534599304, max_abs=5.0, mean_rel=0.15871424973011017, max_rel=1284.2808837890625, norm_rel=0.024352913722395897, ref_abs_avg=25.575550079345703, test_abs_avg=25.577068328857422
production_forward grad[64] vs paper_forward: mean_abs=0.5735796093940735, max_abs=3.875, mean_rel=0.20056596398353577, max_rel=1468.7498779296875, norm_rel=0.022835083305835724, ref_abs_avg=25.140933990478516, test_abs_avg=25.148174285888672
production_forward grad[65] vs paper_forward: mean_abs=0.4869990348815918, max_abs=1.75, mean_rel=0.1274198293685913, max_rel=9.217095375061035, norm_rel=0.023298710584640503, ref_abs_avg=20.292320251464844, test_abs_avg=20.281003952026367
production_forward grad[66] vs paper_forward: mean_abs=0.5874742269515991, max_abs=5.0, mean_rel=0.1425955593585968, max_rel=549.375, norm_rel=0.024106888100504875, ref_abs_avg=24.436878204345703, test_abs_avg=24.436702728271484
production_forward grad[67] vs paper_forward: mean_abs=0.5428962707519531, max_abs=4.5, mean_rel=0.2505965828895569, max_rel=1562.4998779296875, norm_rel=0.022142790257930756, ref_abs_avg=24.52066993713379, test_abs_avg=24.519977569580078
production_forward grad[68] vs paper_forward: mean_abs=0.4451899528503418, max_abs=2.0, mean_rel=0.10234188288450241, max_rel=7.924880027770996, norm_rel=0.023149771615862846, ref_abs_avg=19.202573776245117, test_abs_avg=19.1640567779541
production_forward grad[69] vs paper_forward: mean_abs=0.5672913789749146, max_abs=4.5, mean_rel=0.15023352205753326, max_rel=879.1011962890625, norm_rel=0.023778147995471954, ref_abs_avg=23.877925872802734, test_abs_avg=23.880516052246094
production_forward grad[70] vs paper_forward: mean_abs=0.5202361345291138, max_abs=3.625, mean_rel=0.21788184344768524, max_rel=1640.6248779296875, norm_rel=0.021992642432451248, ref_abs_avg=23.69558334350586, test_abs_avg=23.697250366210938
production_forward grad[71] vs paper_forward: mean_abs=0.4035775661468506, max_abs=1.5, mean_rel=0.20060941576957703, max_rel=32.13320541381836, norm_rel=0.020661551505327225, ref_abs_avg=19.040000915527344, test_abs_avg=19.081493377685547
production_forward grad[72] vs paper_forward: mean_abs=0.5398006439208984, max_abs=5.0, mean_rel=0.14701639115810394, max_rel=932.2598266601562, norm_rel=0.023434534668922424, ref_abs_avg=23.078731536865234, test_abs_avg=23.078739166259766
production_forward grad[73] vs paper_forward: mean_abs=0.49781012535095215, max_abs=3.875, mean_rel=0.23428289592266083, max_rel=1374.9998779296875, norm_rel=0.022108351811766624, ref_abs_avg=22.514820098876953, test_abs_avg=22.513978958129883
production_forward grad[74] vs paper_forward: mean_abs=0.4716777801513672, max_abs=2.1875, mean_rel=0.07249697297811508, max_rel=7.0627336502075195, norm_rel=0.02315209060907364, ref_abs_avg=20.592437744140625, test_abs_avg=20.555986404418945
production_forward grad[75] vs paper_forward: mean_abs=0.5992128849029541, max_abs=4.5, mean_rel=0.1563260555267334, max_rel=1158.898681640625, norm_rel=0.02459382452070713, ref_abs_avg=24.445781707763672, test_abs_avg=24.44609832763672
production_forward grad[76] vs paper_forward: mean_abs=0.5568830966949463, max_abs=4.5, mean_rel=0.24320557713508606, max_rel=3031.249755859375, norm_rel=0.02332822047173977, ref_abs_avg=23.97626304626465, test_abs_avg=23.981407165527344
production_forward grad[77] vs paper_forward: mean_abs=0.4313926696777344, max_abs=1.6875, mean_rel=0.08423486351966858, max_rel=3.548936128616333, norm_rel=0.02199646085500717, ref_abs_avg=19.778949737548828, test_abs_avg=19.776287078857422
production_forward grad[78] vs paper_forward: mean_abs=0.5625585317611694, max_abs=4.5, mean_rel=0.14882296323776245, max_rel=1065.4127197265625, norm_rel=0.023979132995009422, ref_abs_avg=23.469127655029297, test_abs_avg=23.47086524963379
production_forward grad[79] vs paper_forward: mean_abs=0.5096043348312378, max_abs=3.75, mean_rel=0.23258182406425476, max_rel=2437.5, norm_rel=0.02207976020872593, ref_abs_avg=23.11919403076172, test_abs_avg=23.122634887695312
production_forward grad[80] vs paper_forward: mean_abs=0.41747379302978516, max_abs=1.6953125, mean_rel=0.09592106938362122, max_rel=5.225978851318359, norm_rel=0.02312544547021389, ref_abs_avg=18.147197723388672, test_abs_avg=18.12141227722168
production_forward grad[81] vs paper_forward: mean_abs=0.5185064077377319, max_abs=4.0, mean_rel=0.14656411111354828, max_rel=1200.6253662109375, norm_rel=0.023555902764201164, ref_abs_avg=22.046449661254883, test_abs_avg=22.04763412475586
production_forward grad[82] vs paper_forward: mean_abs=0.47757822275161743, max_abs=3.125, mean_rel=0.21363908052444458, max_rel=1562.4998779296875, norm_rel=0.022146956995129585, ref_abs_avg=21.53233528137207, test_abs_avg=21.530475616455078
production_forward grad[83] vs paper_forward: mean_abs=0.3903054893016815, max_abs=2.125, mean_rel=0.07000789791345596, max_rel=3.41667103767395, norm_rel=0.022121233865618706, ref_abs_avg=18.040355682373047, test_abs_avg=18.05414581298828
production_forward grad[84] vs paper_forward: mean_abs=0.48092007637023926, max_abs=4.5, mean_rel=0.1353529691696167, max_rel=1631.5001220703125, norm_rel=0.022915419191122055, ref_abs_avg=21.07796287536621, test_abs_avg=21.07876205444336
production_forward grad[85] vs paper_forward: mean_abs=0.43120262026786804, max_abs=3.71875, mean_rel=0.1976548433303833, max_rel=1250.0, norm_rel=0.020767901092767715, ref_abs_avg=20.75836944580078, test_abs_avg=20.754987716674805
production_forward grad[86] vs paper_forward: mean_abs=0.3512314558029175, max_abs=1.7177734375, mean_rel=0.16421456634998322, max_rel=22.51677703857422, norm_rel=0.02147296629846096, ref_abs_avg=16.377769470214844, test_abs_avg=16.393535614013672
production_forward grad[87] vs paper_forward: mean_abs=0.4494384527206421, max_abs=4.5, mean_rel=0.14186245203018188, max_rel=903.779541015625, norm_rel=0.02227337472140789, ref_abs_avg=20.277992248535156, test_abs_avg=20.278913497924805
production_forward grad[88] vs paper_forward: mean_abs=0.40301376581192017, max_abs=3.5, mean_rel=0.16368219256401062, max_rel=925.7811889648438, norm_rel=0.020174963399767876, ref_abs_avg=20.022659301757812, test_abs_avg=20.011459350585938
production_forward grad[89] vs paper_forward: mean_abs=0.33159327507019043, max_abs=1.5, mean_rel=0.0922413170337677, max_rel=11.064273834228516, norm_rel=0.02099502645432949, ref_abs_avg=15.651432037353516, test_abs_avg=15.642428398132324
production_forward grad[90] vs paper_forward: mean_abs=0.42072775959968567, max_abs=4.5, mean_rel=0.13368800282478333, max_rel=536.9297485351562, norm_rel=0.02175223082304001, ref_abs_avg=19.513965606689453, test_abs_avg=19.51385498046875
production_forward grad[91] vs paper_forward: mean_abs=0.3855476379394531, max_abs=3.75, mean_rel=0.19120854139328003, max_rel=2187.5, norm_rel=0.01993071846663952, ref_abs_avg=19.365272521972656, test_abs_avg=19.369537353515625
production_forward grad[92] vs paper_forward: mean_abs=0.3201674222946167, max_abs=1.0625, mean_rel=0.09080380201339722, max_rel=7.697360515594482, norm_rel=0.020417965948581696, ref_abs_avg=15.290216445922852, test_abs_avg=15.278312683105469
production_forward grad[93] vs paper_forward: mean_abs=0.3901844620704651, max_abs=4.125, mean_rel=0.13177704811096191, max_rel=691.211669921875, norm_rel=0.021490860730409622, ref_abs_avg=18.39264678955078, test_abs_avg=18.39264678955078
production_forward grad[94] vs paper_forward: mean_abs=0.35412168502807617, max_abs=3.75, mean_rel=0.17387709021568298, max_rel=1437.4998779296875, norm_rel=0.019326595589518547, ref_abs_avg=18.51291275024414, test_abs_avg=18.51898956298828
production_forward grad[95] vs paper_forward: mean_abs=0.3013296127319336, max_abs=1.375, mean_rel=0.11043326556682587, max_rel=10.363049507141113, norm_rel=0.019898580387234688, ref_abs_avg=15.04410457611084, test_abs_avg=15.026397705078125
production_forward grad[96] vs paper_forward: mean_abs=0.37654250860214233, max_abs=4.75, mean_rel=0.12161329388618469, max_rel=1022.5014038085938, norm_rel=0.020785322412848473, ref_abs_avg=18.41585350036621, test_abs_avg=18.415538787841797
production_forward grad[97] vs paper_forward: mean_abs=0.3332233130931854, max_abs=3.75, mean_rel=0.15823085606098175, max_rel=937.4999389648438, norm_rel=0.019204720854759216, ref_abs_avg=17.740827560424805, test_abs_avg=17.740707397460938
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016631188336759806, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008734920993447304, max_abs=0.4765625, mean_rel=0.07536441832780838, max_rel=96.16439819335938, norm_rel=0.02058040350675583, ref_abs_avg=0.4576581120491028, test_abs_avg=0.4576607346534729
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.534849166870117, max_abs=50.0, mean_rel=0.17786674201488495, max_rel=667.8881225585938, norm_rel=0.0212464090436697, ref_abs_avg=316.9703369140625, test_abs_avg=317.0187683105469
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2435808181762695, max_abs=5.0625, mean_rel=0.13291826844215393, max_rel=19.956514358520508, norm_rel=0.023738516494631767, ref_abs_avg=54.396995544433594, test_abs_avg=54.35686111450195
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6199462413787842, max_abs=11.0, mean_rel=0.1714654564857483, max_rel=1340.3463134765625, norm_rel=0.025370096787810326, ref_abs_avg=64.22499084472656, test_abs_avg=64.22743225097656
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5025873184204102, max_abs=10.0, mean_rel=0.5550118684768677, max_rel=5000.0, norm_rel=0.023723706603050232, ref_abs_avg=63.645503997802734, test_abs_avg=63.651973724365234
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0626511573791504, max_abs=4.25, mean_rel=0.15698745846748352, max_rel=22.320037841796875, norm_rel=0.023553799837827682, ref_abs_avg=46.02235412597656, test_abs_avg=46.10277557373047
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4318498373031616, max_abs=9.0, mean_rel=0.17711952328681946, max_rel=1838.944580078125, norm_rel=0.02513822168111801, ref_abs_avg=57.30668640136719, test_abs_avg=57.313663482666016
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.328608512878418, max_abs=8.0, mean_rel=0.352510929107666, max_rel=3937.499755859375, norm_rel=0.023916209116578102, ref_abs_avg=55.828392028808594, test_abs_avg=55.82822036743164
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0193307399749756, max_abs=4.0, mean_rel=0.1166272982954979, max_rel=13.528241157531738, norm_rel=0.023940227925777435, ref_abs_avg=42.657676696777344, test_abs_avg=42.572898864746094
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3030815124511719, max_abs=9.0, mean_rel=0.1702893078327179, max_rel=1859.4329833984375, norm_rel=0.0249885693192482, ref_abs_avg=52.498939514160156, test_abs_avg=52.50102615356445
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2041946649551392, max_abs=8.5, mean_rel=0.318162739276886, max_rel=3874.999755859375, norm_rel=0.023301923647522926, ref_abs_avg=51.901329040527344, test_abs_avg=51.89552307128906
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9620304107666016, max_abs=4.0, mean_rel=0.11359436810016632, max_rel=8.952254295349121, norm_rel=0.024433806538581848, ref_abs_avg=39.38050079345703, test_abs_avg=39.37392807006836
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2015135288238525, max_abs=8.0, mean_rel=0.16426345705986023, max_rel=1700.5506591796875, norm_rel=0.024690447375178337, ref_abs_avg=48.95586395263672, test_abs_avg=48.955078125
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1182883977890015, max_abs=7.25, mean_rel=0.3911386728286743, max_rel=3437.499755859375, norm_rel=0.02320658415555954, ref_abs_avg=48.48320007324219, test_abs_avg=48.48292541503906
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8724002838134766, max_abs=4.0, mean_rel=0.0789240375161171, max_rel=6.573549747467041, norm_rel=0.023867441341280937, ref_abs_avg=36.98554229736328, test_abs_avg=36.89567565917969
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1270231008529663, max_abs=8.0, mean_rel=0.1687222123146057, max_rel=1336.12744140625, norm_rel=0.024587249383330345, ref_abs_avg=46.10129928588867, test_abs_avg=46.1004753112793
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0437476634979248, max_abs=6.375, mean_rel=0.29881834983825684, max_rel=3437.499755859375, norm_rel=0.022932803258299828, ref_abs_avg=45.71990203857422, test_abs_avg=45.725128173828125
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.812035322189331, max_abs=3.375, mean_rel=0.10125213861465454, max_rel=9.349071502685547, norm_rel=0.024354761466383934, ref_abs_avg=33.75377655029297, test_abs_avg=33.605613708496094
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0659825801849365, max_abs=7.0, mean_rel=0.16357319056987762, max_rel=1971.69873046875, norm_rel=0.02437928318977356, ref_abs_avg=44.00061798095703, test_abs_avg=43.9974479675293
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9801626205444336, max_abs=6.0, mean_rel=0.3050772249698639, max_rel=2437.5, norm_rel=0.022605782374739647, ref_abs_avg=43.446956634521484, test_abs_avg=43.440162658691406
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8164255619049072, max_abs=3.0, mean_rel=0.17042970657348633, max_rel=24.832134246826172, norm_rel=0.0251141507178545, ref_abs_avg=32.20331573486328, test_abs_avg=32.2742805480957
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0061832666397095, max_abs=7.0, mean_rel=0.16962921619415283, max_rel=3129.768798828125, norm_rel=0.024330222979187965, ref_abs_avg=41.55921173095703, test_abs_avg=41.5601692199707
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9282066822052002, max_abs=5.53125, mean_rel=0.3041267991065979, max_rel=2312.5, norm_rel=0.02265509031713009, ref_abs_avg=41.16310119628906, test_abs_avg=41.16994094848633
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7555948495864868, max_abs=2.75, mean_rel=0.10987387597560883, max_rel=9.847570419311523, norm_rel=0.022474396973848343, ref_abs_avg=33.39316940307617, test_abs_avg=33.453094482421875
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9634007215499878, max_abs=7.0, mean_rel=0.1554076075553894, max_rel=892.9730834960938, norm_rel=0.023982197046279907, ref_abs_avg=40.391456604003906, test_abs_avg=40.39195251464844
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8849917054176331, max_abs=6.5, mean_rel=0.27439993619918823, max_rel=2250.0, norm_rel=0.022433752194046974, ref_abs_avg=39.626956939697266, test_abs_avg=39.62870407104492
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8355995416641235, max_abs=4.5625, mean_rel=1.1102302074432373, max_rel=521.6608276367188, norm_rel=0.024162113666534424, ref_abs_avg=35.92981719970703, test_abs_avg=36.00499725341797
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1097455024719238, max_abs=8.0, mean_rel=0.18809650838375092, max_rel=1025.447998046875, norm_rel=0.026084506884217262, ref_abs_avg=42.801849365234375, test_abs_avg=42.80480194091797
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0173203945159912, max_abs=6.0, mean_rel=0.3667876124382019, max_rel=2781.249755859375, norm_rel=0.02446095272898674, ref_abs_avg=41.685646057128906, test_abs_avg=41.68596649169922
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8128764629364014, max_abs=2.875, mean_rel=0.18225708603858948, max_rel=19.641511917114258, norm_rel=0.025040224194526672, ref_abs_avg=31.705829620361328, test_abs_avg=31.657058715820312
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.019553542137146, max_abs=7.0, mean_rel=0.1816398799419403, max_rel=1891.331298828125, norm_rel=0.026230860501527786, ref_abs_avg=39.047080993652344, test_abs_avg=39.04615020751953
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9536933302879333, max_abs=6.0, mean_rel=0.33259594440460205, max_rel=2250.0, norm_rel=0.024856600910425186, ref_abs_avg=38.54276657104492, test_abs_avg=38.54888153076172
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7330036163330078, max_abs=2.6875, mean_rel=0.06481177359819412, max_rel=8.692400932312012, norm_rel=0.023049132898449898, ref_abs_avg=32.443382263183594, test_abs_avg=32.420860290527344
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9579511880874634, max_abs=7.5, mean_rel=0.1709783673286438, max_rel=2383.5283203125, norm_rel=0.026042357087135315, ref_abs_avg=36.95179748535156, test_abs_avg=36.95266342163086
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8940852880477905, max_abs=5.5, mean_rel=0.2629373073577881, max_rel=2484.375, norm_rel=0.024516409263014793, ref_abs_avg=36.52764129638672, test_abs_avg=36.52689743041992
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6745429635047913, max_abs=2.375, mean_rel=0.06746669113636017, max_rel=2.265482187271118, norm_rel=0.023559514433145523, ref_abs_avg=28.592866897583008, test_abs_avg=28.584476470947266
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8935587406158447, max_abs=6.0, mean_rel=0.16739951074123383, max_rel=1024.936279296875, norm_rel=0.0257993396371603, ref_abs_avg=34.76472091674805, test_abs_avg=34.76593017578125
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8424614667892456, max_abs=5.375, mean_rel=0.27891674637794495, max_rel=2187.5, norm_rel=0.024703729897737503, ref_abs_avg=34.24823760986328, test_abs_avg=34.252281188964844
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6601524353027344, max_abs=2.80859375, mean_rel=0.09018832445144653, max_rel=4.991516590118408, norm_rel=0.02323945239186287, ref_abs_avg=28.486955642700195, test_abs_avg=28.49326515197754
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8471009731292725, max_abs=6.5, mean_rel=0.17690426111221313, max_rel=2394.578125, norm_rel=0.0256191473454237, ref_abs_avg=33.15801239013672, test_abs_avg=33.15815734863281
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7900450229644775, max_abs=5.0, mean_rel=0.2926877737045288, max_rel=2500.0, norm_rel=0.024073952808976173, ref_abs_avg=32.909202575683594, test_abs_avg=32.90917205810547
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6344509124755859, max_abs=2.25, mean_rel=0.07086722552776337, max_rel=3.3190512657165527, norm_rel=0.023334138095378876, ref_abs_avg=27.190330505371094, test_abs_avg=27.173490524291992
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.799685537815094, max_abs=6.0, mean_rel=0.16548925638198853, max_rel=1253.3939208984375, norm_rel=0.025202861055731773, ref_abs_avg=31.846338272094727, test_abs_avg=31.845630645751953
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7493346929550171, max_abs=4.5, mean_rel=0.2908744215965271, max_rel=3062.499755859375, norm_rel=0.023960841819643974, ref_abs_avg=31.380191802978516, test_abs_avg=31.37491226196289
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5755777359008789, max_abs=2.15625, mean_rel=0.09624462574720383, max_rel=5.475215911865234, norm_rel=0.022406207397580147, ref_abs_avg=25.97134017944336, test_abs_avg=25.988088607788086
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7646273970603943, max_abs=5.5, mean_rel=0.1781277060508728, max_rel=2391.768310546875, norm_rel=0.025093674659729004, ref_abs_avg=30.583213806152344, test_abs_avg=30.58394432067871
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7129014730453491, max_abs=4.375, mean_rel=0.2995477616786957, max_rel=1906.2498779296875, norm_rel=0.023640230298042297, ref_abs_avg=30.240280151367188, test_abs_avg=30.23752784729004
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5558373928070068, max_abs=2.125, mean_rel=0.2332659810781479, max_rel=67.53121948242188, norm_rel=0.02360949106514454, ref_abs_avg=23.363550186157227, test_abs_avg=23.398540496826172
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.732426643371582, max_abs=4.5, mean_rel=0.16405892372131348, max_rel=1663.744384765625, norm_rel=0.02488606795668602, ref_abs_avg=29.506683349609375, test_abs_avg=29.507030487060547
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6830877661705017, max_abs=4.734375, mean_rel=0.2715131938457489, max_rel=2375.0, norm_rel=0.023555492982268333, ref_abs_avg=29.091655731201172, test_abs_avg=29.087003707885742
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6594858169555664, max_abs=3.0, mean_rel=0.11986882984638214, max_rel=9.467727661132812, norm_rel=0.02561919018626213, ref_abs_avg=25.945098876953125, test_abs_avg=26.002986907958984
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8225420713424683, max_abs=6.0, mean_rel=0.1730046570301056, max_rel=808.0667724609375, norm_rel=0.026326384395360947, ref_abs_avg=31.374462127685547, test_abs_avg=31.378644943237305
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7640206813812256, max_abs=5.75, mean_rel=0.30416959524154663, max_rel=2437.5, norm_rel=0.024911746382713318, ref_abs_avg=30.789520263671875, test_abs_avg=30.79594612121582
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5989199876785278, max_abs=2.25, mean_rel=0.09620049595832825, max_rel=5.29246711730957, norm_rel=0.0250356774777174, ref_abs_avg=23.84389877319336, test_abs_avg=23.825244903564453
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7666987180709839, max_abs=6.0, mean_rel=0.157046377658844, max_rel=1083.183349609375, norm_rel=0.025777891278266907, ref_abs_avg=29.799884796142578, test_abs_avg=29.799484252929688
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7153007984161377, max_abs=4.5, mean_rel=0.257975697517395, max_rel=2812.499755859375, norm_rel=0.024565599858760834, ref_abs_avg=29.178089141845703, test_abs_avg=29.179462432861328
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5611615180969238, max_abs=2.359375, mean_rel=0.1549755334854126, max_rel=28.177892684936523, norm_rel=0.024714648723602295, ref_abs_avg=23.393003463745117, test_abs_avg=23.398605346679688
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7079270482063293, max_abs=5.0, mean_rel=0.15966682136058807, max_rel=809.1988525390625, norm_rel=0.025400435552001, ref_abs_avg=27.921274185180664, test_abs_avg=27.921527862548828
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6582063436508179, max_abs=4.25, mean_rel=0.2610991895198822, max_rel=2281.25, norm_rel=0.023801449686288834, ref_abs_avg=27.691734313964844, test_abs_avg=27.692886352539062
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5209100246429443, max_abs=2.0, mean_rel=0.08074726164340973, max_rel=4.20281982421875, norm_rel=0.02282021753489971, ref_abs_avg=22.526409149169922, test_abs_avg=22.520009994506836
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6652283668518066, max_abs=5.0, mean_rel=0.15993255376815796, max_rel=734.3947143554688, norm_rel=0.02491801418364048, ref_abs_avg=26.72528076171875, test_abs_avg=26.725669860839844
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6127451062202454, max_abs=4.78125, mean_rel=0.2649676203727722, max_rel=1999.9998779296875, norm_rel=0.02340979315340519, ref_abs_avg=26.21833038330078, test_abs_avg=26.22170639038086
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.5030829906463623, max_abs=1.9375, mean_rel=0.2737380862236023, max_rel=69.841064453125, norm_rel=0.022987689822912216, ref_abs_avg=21.435123443603516, test_abs_avg=21.441959381103516
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6288113594055176, max_abs=5.0, mean_rel=0.15968982875347137, max_rel=1353.9442138671875, norm_rel=0.02459634467959404, ref_abs_avg=25.575550079345703, test_abs_avg=25.57626724243164
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.580909252166748, max_abs=4.0, mean_rel=0.20973196625709534, max_rel=1937.4998779296875, norm_rel=0.02311243861913681, ref_abs_avg=25.140933990478516, test_abs_avg=25.14600944519043
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.5039634704589844, max_abs=1.875, mean_rel=0.13628652691841125, max_rel=14.117469787597656, norm_rel=0.024160701781511307, ref_abs_avg=20.292320251464844, test_abs_avg=20.30084228515625
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.593615710735321, max_abs=4.5, mean_rel=0.14262989163398743, max_rel=403.805908203125, norm_rel=0.0243445485830307, ref_abs_avg=24.436878204345703, test_abs_avg=24.436920166015625
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5539430975914001, max_abs=4.375, mean_rel=0.24898375570774078, max_rel=1999.9998779296875, norm_rel=0.022607283666729927, ref_abs_avg=24.52066993713379, test_abs_avg=24.517108917236328
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.45050811767578125, max_abs=1.75, mean_rel=0.10479573905467987, max_rel=10.855088233947754, norm_rel=0.023307550698518753, ref_abs_avg=19.202573776245117, test_abs_avg=19.174203872680664
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5719418525695801, max_abs=5.0, mean_rel=0.15067656338214874, max_rel=828.1070556640625, norm_rel=0.023958632722496986, ref_abs_avg=23.877925872802734, test_abs_avg=23.88066864013672
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5263299942016602, max_abs=4.0, mean_rel=0.22648946940898895, max_rel=1843.7498779296875, norm_rel=0.022253049537539482, ref_abs_avg=23.69558334350586, test_abs_avg=23.69597816467285
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3986012935638428, max_abs=1.546875, mean_rel=0.14450505375862122, max_rel=20.53541374206543, norm_rel=0.02046937309205532, ref_abs_avg=19.040000915527344, test_abs_avg=19.070598602294922
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5435123443603516, max_abs=5.0, mean_rel=0.14721941947937012, max_rel=1308.5587158203125, norm_rel=0.02357396110892296, ref_abs_avg=23.078731536865234, test_abs_avg=23.079133987426758
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.5030243396759033, max_abs=4.5, mean_rel=0.23493435978889465, max_rel=1468.7498779296875, norm_rel=0.022346392273902893, ref_abs_avg=22.514820098876953, test_abs_avg=22.514326095581055
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4714479446411133, max_abs=1.6875, mean_rel=0.07686297595500946, max_rel=6.283755779266357, norm_rel=0.02321731299161911, ref_abs_avg=20.592437744140625, test_abs_avg=20.575912475585938
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6062238216400146, max_abs=5.0, mean_rel=0.15567263960838318, max_rel=1310.257080078125, norm_rel=0.024864457547664642, ref_abs_avg=24.445781707763672, test_abs_avg=24.44649887084961
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5614876747131348, max_abs=4.0, mean_rel=0.2398446500301361, max_rel=3312.499755859375, norm_rel=0.023505736142396927, ref_abs_avg=23.97626304626465, test_abs_avg=23.980850219726562
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.44498348236083984, max_abs=1.75, mean_rel=0.0819242000579834, max_rel=4.884343147277832, norm_rel=0.022666053846478462, ref_abs_avg=19.778949737548828, test_abs_avg=19.78473663330078
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5675803422927856, max_abs=5.0, mean_rel=0.14941149950027466, max_rel=1160.80712890625, norm_rel=0.024182232096791267, ref_abs_avg=23.469127655029297, test_abs_avg=23.470603942871094
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5186004042625427, max_abs=3.59375, mean_rel=0.22501981258392334, max_rel=2375.0, norm_rel=0.022461459040641785, ref_abs_avg=23.11919403076172, test_abs_avg=23.12305450439453
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4043203294277191, max_abs=1.5, mean_rel=0.11311865597963333, max_rel=11.163599014282227, norm_rel=0.022076886147260666, ref_abs_avg=18.147197723388672, test_abs_avg=18.118732452392578
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5229805111885071, max_abs=4.75, mean_rel=0.14660026133060455, max_rel=923.2720947265625, norm_rel=0.0237470380961895, ref_abs_avg=22.046449661254883, test_abs_avg=22.047443389892578
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4804062843322754, max_abs=3.625, mean_rel=0.2151518166065216, max_rel=1406.2498779296875, norm_rel=0.022290805354714394, ref_abs_avg=21.53233528137207, test_abs_avg=21.53018569946289
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.39097094535827637, max_abs=2.0, mean_rel=0.07652609050273895, max_rel=2.6643764972686768, norm_rel=0.022441726177930832, ref_abs_avg=18.040355682373047, test_abs_avg=18.057235717773438
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.48418867588043213, max_abs=6.0, mean_rel=0.13554485142230988, max_rel=837.7950439453125, norm_rel=0.023057077080011368, ref_abs_avg=21.07796287536621, test_abs_avg=21.078750610351562
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.43484362959861755, max_abs=3.5, mean_rel=0.19263139367103577, max_rel=1250.0, norm_rel=0.020948542281985283, ref_abs_avg=20.75836944580078, test_abs_avg=20.75920295715332
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.34489011764526367, max_abs=1.7646484375, mean_rel=0.14088936150074005, max_rel=25.249126434326172, norm_rel=0.021393252536654472, ref_abs_avg=16.377769470214844, test_abs_avg=16.38683319091797
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4525810480117798, max_abs=4.25, mean_rel=0.14254559576511383, max_rel=474.9052734375, norm_rel=0.022403445094823837, ref_abs_avg=20.277992248535156, test_abs_avg=20.278976440429688
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.40606433153152466, max_abs=3.25, mean_rel=0.16601112484931946, max_rel=937.4999389648438, norm_rel=0.02034001611173153, ref_abs_avg=20.022659301757812, test_abs_avg=20.01275634765625
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.33705568313598633, max_abs=1.75, mean_rel=0.06881939619779587, max_rel=5.283928394317627, norm_rel=0.021508794277906418, ref_abs_avg=15.651432037353516, test_abs_avg=15.625113487243652
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.42273062467575073, max_abs=5.5, mean_rel=0.13081949949264526, max_rel=485.21673583984375, norm_rel=0.021867839619517326, ref_abs_avg=19.513965606689453, test_abs_avg=19.51541519165039
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.38600343465805054, max_abs=4.0, mean_rel=0.18431836366653442, max_rel=1906.2498779296875, norm_rel=0.019943973049521446, ref_abs_avg=19.365272521972656, test_abs_avg=19.36781883239746
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.323793888092041, max_abs=1.125, mean_rel=0.11021582782268524, max_rel=10.924825668334961, norm_rel=0.020669816061854362, ref_abs_avg=15.290216445922852, test_abs_avg=15.296327590942383
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.39138323068618774, max_abs=5.0, mean_rel=0.12980082631111145, max_rel=666.8961181640625, norm_rel=0.021532610058784485, ref_abs_avg=18.39264678955078, test_abs_avg=18.392498016357422
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3556043803691864, max_abs=3.5, mean_rel=0.1771906018257141, max_rel=1374.9998779296875, norm_rel=0.019372515380382538, ref_abs_avg=18.51291275024414, test_abs_avg=18.518234252929688
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2873809337615967, max_abs=1.203125, mean_rel=0.09323207288980484, max_rel=6.812236785888672, norm_rel=0.019149020314216614, ref_abs_avg=15.04410457611084, test_abs_avg=15.029367446899414
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3772353529930115, max_abs=4.5, mean_rel=0.12673301994800568, max_rel=1176.2586669921875, norm_rel=0.02083905227482319, ref_abs_avg=18.41585350036621, test_abs_avg=18.41537094116211
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3357917070388794, max_abs=3.875, mean_rel=0.1628839671611786, max_rel=953.1249389648438, norm_rel=0.01943545788526535, ref_abs_avg=17.740827560424805, test_abs_avg=17.743356704711914
production_forward2 vs paper_forward output: mean_abs=0.0016596447676420212, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.00872183870524168, max_abs=0.3828125, mean_rel=0.07523281127214432, max_rel=125.7153091430664, norm_rel=0.02054622210562229, ref_abs_avg=0.4576581120491028, test_abs_avg=0.45766234397888184
production_forward2 grad[1] vs paper_forward: mean_abs=7.518611907958984, max_abs=56.0, mean_rel=0.17114993929862976, max_rel=560.8703002929688, norm_rel=0.021238287910819054, ref_abs_avg=316.9703369140625, test_abs_avg=316.97821044921875
production_forward2 grad[2] vs paper_forward: mean_abs=1.235542893409729, max_abs=5.0, mean_rel=0.13358302414417267, max_rel=18.557973861694336, norm_rel=0.023317918181419373, ref_abs_avg=54.396995544433594, test_abs_avg=54.385108947753906
production_forward2 grad[3] vs paper_forward: mean_abs=1.6160757541656494, max_abs=12.0, mean_rel=0.16899433732032776, max_rel=1350.6617431640625, norm_rel=0.025296876206994057, ref_abs_avg=64.22499084472656, test_abs_avg=64.22992706298828
production_forward2 grad[4] vs paper_forward: mean_abs=1.4956257343292236, max_abs=9.6875, mean_rel=0.5572162866592407, max_rel=5624.99951171875, norm_rel=0.023616082966327667, ref_abs_avg=63.645503997802734, test_abs_avg=63.65435791015625
production_forward2 grad[5] vs paper_forward: mean_abs=1.0720778703689575, max_abs=4.5, mean_rel=0.23159851133823395, max_rel=36.77000045776367, norm_rel=0.024169500917196274, ref_abs_avg=46.02235412597656, test_abs_avg=46.057762145996094
production_forward2 grad[6] vs paper_forward: mean_abs=1.428748369216919, max_abs=9.5, mean_rel=0.16931435465812683, max_rel=1620.297119140625, norm_rel=0.025085775181651115, ref_abs_avg=57.30668640136719, test_abs_avg=57.31277847290039
production_forward2 grad[7] vs paper_forward: mean_abs=1.3213624954223633, max_abs=8.0, mean_rel=0.3542572855949402, max_rel=3937.499755859375, norm_rel=0.023779258131980896, ref_abs_avg=55.828392028808594, test_abs_avg=55.830047607421875
production_forward2 grad[8] vs paper_forward: mean_abs=1.011368751525879, max_abs=4.0, mean_rel=0.12676145136356354, max_rel=21.344905853271484, norm_rel=0.023835521191358566, ref_abs_avg=42.657676696777344, test_abs_avg=42.58184051513672
production_forward2 grad[9] vs paper_forward: mean_abs=1.3006374835968018, max_abs=8.0, mean_rel=0.17494507133960724, max_rel=1931.248291015625, norm_rel=0.024954138323664665, ref_abs_avg=52.498939514160156, test_abs_avg=52.5016975402832
production_forward2 grad[10] vs paper_forward: mean_abs=1.200879693031311, max_abs=8.0, mean_rel=0.32882118225097656, max_rel=3499.999755859375, norm_rel=0.023232920095324516, ref_abs_avg=51.901329040527344, test_abs_avg=51.89885711669922
production_forward2 grad[11] vs paper_forward: mean_abs=0.9543437957763672, max_abs=3.75, mean_rel=0.11901409924030304, max_rel=8.745026588439941, norm_rel=0.02409798838198185, ref_abs_avg=39.38050079345703, test_abs_avg=39.37724304199219
production_forward2 grad[12] vs paper_forward: mean_abs=1.1985046863555908, max_abs=8.0, mean_rel=0.16723424196243286, max_rel=1986.527587890625, norm_rel=0.024621358141303062, ref_abs_avg=48.95586395263672, test_abs_avg=48.9542350769043
production_forward2 grad[13] vs paper_forward: mean_abs=1.1115599870681763, max_abs=6.75, mean_rel=0.36856505274772644, max_rel=3062.499755859375, norm_rel=0.023050999268889427, ref_abs_avg=48.48320007324219, test_abs_avg=48.482337951660156
production_forward2 grad[14] vs paper_forward: mean_abs=0.8673439025878906, max_abs=3.5, mean_rel=0.07929173111915588, max_rel=6.86069393157959, norm_rel=0.023703008890151978, ref_abs_avg=36.98554229736328, test_abs_avg=36.9184455871582
production_forward2 grad[15] vs paper_forward: mean_abs=1.1240049600601196, max_abs=7.5, mean_rel=0.1609531193971634, max_rel=1030.722900390625, norm_rel=0.024507498368620872, ref_abs_avg=46.10129928588867, test_abs_avg=46.10127258300781
production_forward2 grad[16] vs paper_forward: mean_abs=1.0415211915969849, max_abs=6.375, mean_rel=0.3023971915245056, max_rel=2999.999755859375, norm_rel=0.022882800549268723, ref_abs_avg=45.71990203857422, test_abs_avg=45.72498321533203
production_forward2 grad[17] vs paper_forward: mean_abs=0.8513922691345215, max_abs=3.875, mean_rel=0.11422443389892578, max_rel=9.945980072021484, norm_rel=0.02485891990363598, ref_abs_avg=33.75377655029297, test_abs_avg=33.58976364135742
production_forward2 grad[18] vs paper_forward: mean_abs=1.0650482177734375, max_abs=8.0, mean_rel=0.16024619340896606, max_rel=1050.1881103515625, norm_rel=0.024351656436920166, ref_abs_avg=44.00061798095703, test_abs_avg=43.99932861328125
production_forward2 grad[19] vs paper_forward: mean_abs=0.9797614812850952, max_abs=6.1875, mean_rel=0.3078143000602722, max_rel=2515.625, norm_rel=0.022582542151212692, ref_abs_avg=43.446956634521484, test_abs_avg=43.442604064941406
production_forward2 grad[20] vs paper_forward: mean_abs=0.8188509941101074, max_abs=3.25, mean_rel=0.17759767174720764, max_rel=26.24574851989746, norm_rel=0.025080785155296326, ref_abs_avg=32.20331573486328, test_abs_avg=32.26971435546875
production_forward2 grad[21] vs paper_forward: mean_abs=1.0021294355392456, max_abs=6.5, mean_rel=0.16895419359207153, max_rel=3458.704345703125, norm_rel=0.024238863959908485, ref_abs_avg=41.55921173095703, test_abs_avg=41.55926513671875
production_forward2 grad[22] vs paper_forward: mean_abs=0.9240310192108154, max_abs=5.25, mean_rel=0.30194470286369324, max_rel=3281.249755859375, norm_rel=0.022550838068127632, ref_abs_avg=41.16310119628906, test_abs_avg=41.170475006103516
production_forward2 grad[23] vs paper_forward: mean_abs=0.7350478172302246, max_abs=3.0, mean_rel=0.11734405905008316, max_rel=10.753093719482422, norm_rel=0.022046737372875214, ref_abs_avg=33.39316940307617, test_abs_avg=33.454978942871094
production_forward2 grad[24] vs paper_forward: mean_abs=0.9609277844429016, max_abs=7.0, mean_rel=0.1604008674621582, max_rel=1701.850341796875, norm_rel=0.02391781099140644, ref_abs_avg=40.391456604003906, test_abs_avg=40.392234802246094
production_forward2 grad[25] vs paper_forward: mean_abs=0.8845260143280029, max_abs=6.5, mean_rel=0.2860536575317383, max_rel=2624.999755859375, norm_rel=0.022412844002246857, ref_abs_avg=39.626956939697266, test_abs_avg=39.62847900390625
production_forward2 grad[26] vs paper_forward: mean_abs=0.8741346597671509, max_abs=3.9375, mean_rel=0.8427691459655762, max_rel=375.3990173339844, norm_rel=0.025152171030640602, ref_abs_avg=35.92981719970703, test_abs_avg=35.996620178222656
production_forward2 grad[27] vs paper_forward: mean_abs=1.1062082052230835, max_abs=7.5, mean_rel=0.1924857497215271, max_rel=1758.528076171875, norm_rel=0.026008054614067078, ref_abs_avg=42.801849365234375, test_abs_avg=42.80419921875
production_forward2 grad[28] vs paper_forward: mean_abs=1.0199105739593506, max_abs=6.75, mean_rel=0.39817753434181213, max_rel=2906.249755859375, norm_rel=0.02452886290848255, ref_abs_avg=41.685646057128906, test_abs_avg=41.68279266357422
production_forward2 grad[29] vs paper_forward: mean_abs=0.818682074546814, max_abs=3.125, mean_rel=0.1812707483768463, max_rel=20.121456146240234, norm_rel=0.025368912145495415, ref_abs_avg=31.705829620361328, test_abs_avg=31.65584945678711
production_forward2 grad[30] vs paper_forward: mean_abs=1.0176997184753418, max_abs=7.0, mean_rel=0.18644672632217407, max_rel=2478.109619140625, norm_rel=0.026184439659118652, ref_abs_avg=39.047080993652344, test_abs_avg=39.046592712402344
production_forward2 grad[31] vs paper_forward: mean_abs=0.9525406360626221, max_abs=5.625, mean_rel=0.35575252771377563, max_rel=2937.499755859375, norm_rel=0.02482961304485798, ref_abs_avg=38.54276657104492, test_abs_avg=38.547149658203125
production_forward2 grad[32] vs paper_forward: mean_abs=0.7356967926025391, max_abs=2.75, mean_rel=0.06282871961593628, max_rel=7.717709064483643, norm_rel=0.0233231782913208, ref_abs_avg=32.443382263183594, test_abs_avg=32.37889099121094
production_forward2 grad[33] vs paper_forward: mean_abs=0.9562525749206543, max_abs=7.0, mean_rel=0.17035171389579773, max_rel=1373.8607177734375, norm_rel=0.02600974217057228, ref_abs_avg=36.95179748535156, test_abs_avg=36.95274353027344
production_forward2 grad[34] vs paper_forward: mean_abs=0.8925745487213135, max_abs=5.5, mean_rel=0.27388110756874084, max_rel=2890.624755859375, norm_rel=0.02446957863867283, ref_abs_avg=36.52764129638672, test_abs_avg=36.524688720703125
production_forward2 grad[35] vs paper_forward: mean_abs=0.6573066711425781, max_abs=2.25, mean_rel=0.06303738057613373, max_rel=1.188826322555542, norm_rel=0.022882260382175446, ref_abs_avg=28.592866897583008, test_abs_avg=28.588674545288086
production_forward2 grad[36] vs paper_forward: mean_abs=0.8932176828384399, max_abs=6.0, mean_rel=0.16761790215969086, max_rel=1246.4141845703125, norm_rel=0.025775546208024025, ref_abs_avg=34.76472091674805, test_abs_avg=34.76653289794922
production_forward2 grad[37] vs paper_forward: mean_abs=0.840904951095581, max_abs=5.0, mean_rel=0.2665790319442749, max_rel=1749.9998779296875, norm_rel=0.02464369684457779, ref_abs_avg=34.24823760986328, test_abs_avg=34.25299072265625
production_forward2 grad[38] vs paper_forward: mean_abs=0.6556549072265625, max_abs=3.015625, mean_rel=0.09103375673294067, max_rel=6.445797443389893, norm_rel=0.023483509197831154, ref_abs_avg=28.486955642700195, test_abs_avg=28.477094650268555
production_forward2 grad[39] vs paper_forward: mean_abs=0.8455075621604919, max_abs=6.5, mean_rel=0.17496566474437714, max_rel=2104.865234375, norm_rel=0.025582771748304367, ref_abs_avg=33.15801239013672, test_abs_avg=33.157562255859375
production_forward2 grad[40] vs paper_forward: mean_abs=0.7881841063499451, max_abs=5.0, mean_rel=0.2831805944442749, max_rel=2250.0, norm_rel=0.024009162560105324, ref_abs_avg=32.909202575683594, test_abs_avg=32.90788269042969
production_forward2 grad[41] vs paper_forward: mean_abs=0.6148529052734375, max_abs=2.0625, mean_rel=0.07544152438640594, max_rel=4.268858909606934, norm_rel=0.022312210872769356, ref_abs_avg=27.190330505371094, test_abs_avg=27.189414978027344
production_forward2 grad[42] vs paper_forward: mean_abs=0.8002496957778931, max_abs=5.25, mean_rel=0.16324208676815033, max_rel=1282.7144775390625, norm_rel=0.025217784568667412, ref_abs_avg=31.846338272094727, test_abs_avg=31.845714569091797
production_forward2 grad[43] vs paper_forward: mean_abs=0.747584342956543, max_abs=4.640625, mean_rel=0.2860108017921448, max_rel=2687.499755859375, norm_rel=0.023905865848064423, ref_abs_avg=31.380191802978516, test_abs_avg=31.376258850097656
production_forward2 grad[44] vs paper_forward: mean_abs=0.5730799436569214, max_abs=2.0, mean_rel=0.10031801462173462, max_rel=6.280080318450928, norm_rel=0.02226714789867401, ref_abs_avg=25.97134017944336, test_abs_avg=25.96950912475586
production_forward2 grad[45] vs paper_forward: mean_abs=0.7631908655166626, max_abs=5.5, mean_rel=0.17983965575695038, max_rel=2662.9658203125, norm_rel=0.025057047605514526, ref_abs_avg=30.583213806152344, test_abs_avg=30.584096908569336
production_forward2 grad[46] vs paper_forward: mean_abs=0.7107853889465332, max_abs=4.5, mean_rel=0.3057537376880646, max_rel=2562.5, norm_rel=0.023592300713062286, ref_abs_avg=30.240280151367188, test_abs_avg=30.23975372314453
production_forward2 grad[47] vs paper_forward: mean_abs=0.5681588649749756, max_abs=2.25, mean_rel=0.4008166790008545, max_rel=135.89859008789062, norm_rel=0.024327872321009636, ref_abs_avg=23.363550186157227, test_abs_avg=23.409603118896484
production_forward2 grad[48] vs paper_forward: mean_abs=0.7309867143630981, max_abs=5.5, mean_rel=0.16690056025981903, max_rel=1782.8236083984375, norm_rel=0.024853328242897987, ref_abs_avg=29.506683349609375, test_abs_avg=29.508018493652344
production_forward2 grad[49] vs paper_forward: mean_abs=0.6846193075180054, max_abs=4.390625, mean_rel=0.2827363908290863, max_rel=2125.0, norm_rel=0.023602522909641266, ref_abs_avg=29.091655731201172, test_abs_avg=29.086410522460938
production_forward2 grad[50] vs paper_forward: mean_abs=0.6675987243652344, max_abs=3.4375, mean_rel=0.13162364065647125, max_rel=12.347036361694336, norm_rel=0.025710372254252434, ref_abs_avg=25.945098876953125, test_abs_avg=25.972076416015625
production_forward2 grad[51] vs paper_forward: mean_abs=0.8216434717178345, max_abs=6.5, mean_rel=0.1725749522447586, max_rel=946.000732421875, norm_rel=0.02629964053630829, ref_abs_avg=31.374462127685547, test_abs_avg=31.37845802307129
production_forward2 grad[52] vs paper_forward: mean_abs=0.7639688849449158, max_abs=5.75, mean_rel=0.2859417796134949, max_rel=2828.124755859375, norm_rel=0.024901175871491432, ref_abs_avg=30.789520263671875, test_abs_avg=30.794286727905273
production_forward2 grad[53] vs paper_forward: mean_abs=0.5920724868774414, max_abs=2.0, mean_rel=0.0829407200217247, max_rel=2.5668139457702637, norm_rel=0.02476128563284874, ref_abs_avg=23.84389877319336, test_abs_avg=23.848419189453125
production_forward2 grad[54] vs paper_forward: mean_abs=0.7660196423530579, max_abs=6.0, mean_rel=0.15709173679351807, max_rel=939.56591796875, norm_rel=0.025762192904949188, ref_abs_avg=29.799884796142578, test_abs_avg=29.799314498901367
production_forward2 grad[55] vs paper_forward: mean_abs=0.7143150568008423, max_abs=5.0, mean_rel=0.2613217830657959, max_rel=2156.25, norm_rel=0.02452297881245613, ref_abs_avg=29.178089141845703, test_abs_avg=29.17911148071289
production_forward2 grad[56] vs paper_forward: mean_abs=0.548835277557373, max_abs=2.3125, mean_rel=0.16204172372817993, max_rel=35.75729751586914, norm_rel=0.02441256493330002, ref_abs_avg=23.393003463745117, test_abs_avg=23.39767837524414
production_forward2 grad[57] vs paper_forward: mean_abs=0.7070616483688354, max_abs=6.0, mean_rel=0.15724389255046844, max_rel=584.83251953125, norm_rel=0.025365442037582397, ref_abs_avg=27.921274185180664, test_abs_avg=27.92249298095703
production_forward2 grad[58] vs paper_forward: mean_abs=0.657014012336731, max_abs=4.75, mean_rel=0.262168824672699, max_rel=2031.2498779296875, norm_rel=0.02377626672387123, ref_abs_avg=27.691734313964844, test_abs_avg=27.69387435913086
production_forward2 grad[59] vs paper_forward: mean_abs=0.513371467590332, max_abs=2.0, mean_rel=0.08414129912853241, max_rel=4.311155796051025, norm_rel=0.022920213639736176, ref_abs_avg=22.526409149169922, test_abs_avg=22.509784698486328
production_forward2 grad[60] vs paper_forward: mean_abs=0.6638566255569458, max_abs=5.0, mean_rel=0.16735219955444336, max_rel=1163.5904541015625, norm_rel=0.024872466921806335, ref_abs_avg=26.72528076171875, test_abs_avg=26.726045608520508
production_forward2 grad[61] vs paper_forward: mean_abs=0.6112605333328247, max_abs=4.25, mean_rel=0.2587607204914093, max_rel=1906.2498779296875, norm_rel=0.02335566096007824, ref_abs_avg=26.21833038330078, test_abs_avg=26.225221633911133
production_forward2 grad[62] vs paper_forward: mean_abs=0.5036075115203857, max_abs=2.28125, mean_rel=0.24752381443977356, max_rel=53.069580078125, norm_rel=0.023138757795095444, ref_abs_avg=21.435123443603516, test_abs_avg=21.441194534301758
production_forward2 grad[63] vs paper_forward: mean_abs=0.6284253597259521, max_abs=6.0, mean_rel=0.1595262736082077, max_rel=1317.8447265625, norm_rel=0.024561692029237747, ref_abs_avg=25.575550079345703, test_abs_avg=25.576982498168945
production_forward2 grad[64] vs paper_forward: mean_abs=0.5811009407043457, max_abs=3.75, mean_rel=0.20210841298103333, max_rel=1515.6248779296875, norm_rel=0.023129059001803398, ref_abs_avg=25.140933990478516, test_abs_avg=25.148273468017578
production_forward2 grad[65] vs paper_forward: mean_abs=0.4985947608947754, max_abs=2.0, mean_rel=0.13024526834487915, max_rel=13.84009075164795, norm_rel=0.024007568135857582, ref_abs_avg=20.292320251464844, test_abs_avg=20.296354293823242
production_forward2 grad[66] vs paper_forward: mean_abs=0.5928666591644287, max_abs=4.0, mean_rel=0.14482563734054565, max_rel=557.7881469726562, norm_rel=0.024319441989064217, ref_abs_avg=24.436878204345703, test_abs_avg=24.43634796142578
production_forward2 grad[67] vs paper_forward: mean_abs=0.5480890274047852, max_abs=4.5, mean_rel=0.2513645887374878, max_rel=1874.9998779296875, norm_rel=0.022358983755111694, ref_abs_avg=24.52066993713379, test_abs_avg=24.52056121826172
production_forward2 grad[68] vs paper_forward: mean_abs=0.4330967664718628, max_abs=1.75, mean_rel=0.10460963845252991, max_rel=12.941750526428223, norm_rel=0.022806627675890923, ref_abs_avg=19.202573776245117, test_abs_avg=19.16472625732422
production_forward2 grad[69] vs paper_forward: mean_abs=0.5713949203491211, max_abs=5.0, mean_rel=0.1529541313648224, max_rel=921.1658935546875, norm_rel=0.023940300568938255, ref_abs_avg=23.877925872802734, test_abs_avg=23.880504608154297
production_forward2 grad[70] vs paper_forward: mean_abs=0.5246631503105164, max_abs=3.5, mean_rel=0.2107846438884735, max_rel=1468.7498779296875, norm_rel=0.022182967513799667, ref_abs_avg=23.69558334350586, test_abs_avg=23.696876525878906
production_forward2 grad[71] vs paper_forward: mean_abs=0.4000275135040283, max_abs=1.375, mean_rel=0.2052532136440277, max_rel=36.862403869628906, norm_rel=0.020592648535966873, ref_abs_avg=19.040000915527344, test_abs_avg=19.080486297607422
production_forward2 grad[72] vs paper_forward: mean_abs=0.5427153706550598, max_abs=4.25, mean_rel=0.14824256300926208, max_rel=925.6934814453125, norm_rel=0.023553280159831047, ref_abs_avg=23.078731536865234, test_abs_avg=23.078899383544922
production_forward2 grad[73] vs paper_forward: mean_abs=0.500182032585144, max_abs=3.625, mean_rel=0.2389756143093109, max_rel=1499.9998779296875, norm_rel=0.022211559116840363, ref_abs_avg=22.514820098876953, test_abs_avg=22.51364517211914
production_forward2 grad[74] vs paper_forward: mean_abs=0.47315025329589844, max_abs=2.0625, mean_rel=0.06479412317276001, max_rel=3.7131283283233643, norm_rel=0.02334199659526348, ref_abs_avg=20.592437744140625, test_abs_avg=20.563920974731445
production_forward2 grad[75] vs paper_forward: mean_abs=0.6050481796264648, max_abs=4.5, mean_rel=0.15571966767311096, max_rel=916.26953125, norm_rel=0.024812715128064156, ref_abs_avg=24.445781707763672, test_abs_avg=24.445993423461914
production_forward2 grad[76] vs paper_forward: mean_abs=0.5633209943771362, max_abs=4.25, mean_rel=0.25565841794013977, max_rel=3406.249755859375, norm_rel=0.023581132292747498, ref_abs_avg=23.97626304626465, test_abs_avg=23.980060577392578
production_forward2 grad[77] vs paper_forward: mean_abs=0.43441009521484375, max_abs=1.6875, mean_rel=0.0812825858592987, max_rel=3.421961545944214, norm_rel=0.022115468978881836, ref_abs_avg=19.778949737548828, test_abs_avg=19.772918701171875
production_forward2 grad[78] vs paper_forward: mean_abs=0.5670297145843506, max_abs=4.5, mean_rel=0.14906129240989685, max_rel=1069.5557861328125, norm_rel=0.02416362427175045, ref_abs_avg=23.469127655029297, test_abs_avg=23.470443725585938
production_forward2 grad[79] vs paper_forward: mean_abs=0.5144320130348206, max_abs=3.75, mean_rel=0.23324275016784668, max_rel=2500.0, norm_rel=0.022283893078565598, ref_abs_avg=23.11919403076172, test_abs_avg=23.122798919677734
production_forward2 grad[80] vs paper_forward: mean_abs=0.42975807189941406, max_abs=1.703125, mean_rel=0.09777432680130005, max_rel=4.771925449371338, norm_rel=0.02349553443491459, ref_abs_avg=18.147197723388672, test_abs_avg=18.114673614501953
production_forward2 grad[81] vs paper_forward: mean_abs=0.5223106741905212, max_abs=4.0, mean_rel=0.1455034613609314, max_rel=1314.606201171875, norm_rel=0.023719685152173042, ref_abs_avg=22.046449661254883, test_abs_avg=22.047122955322266
production_forward2 grad[82] vs paper_forward: mean_abs=0.4809865355491638, max_abs=3.5, mean_rel=0.22035111486911774, max_rel=1562.4998779296875, norm_rel=0.022306442260742188, ref_abs_avg=21.53233528137207, test_abs_avg=21.52968406677246
production_forward2 grad[83] vs paper_forward: mean_abs=0.3872438669204712, max_abs=2.125, mean_rel=0.06960245966911316, max_rel=2.839911937713623, norm_rel=0.02233167551457882, ref_abs_avg=18.040355682373047, test_abs_avg=18.06119155883789
production_forward2 grad[84] vs paper_forward: mean_abs=0.484092116355896, max_abs=5.0, mean_rel=0.1361159384250641, max_rel=1357.8087158203125, norm_rel=0.02306489273905754, ref_abs_avg=21.07796287536621, test_abs_avg=21.078472137451172
production_forward2 grad[85] vs paper_forward: mean_abs=0.43413758277893066, max_abs=3.875, mean_rel=0.19697198271751404, max_rel=1054.6875, norm_rel=0.020897671580314636, ref_abs_avg=20.75836944580078, test_abs_avg=20.755435943603516
production_forward2 grad[86] vs paper_forward: mean_abs=0.34339427947998047, max_abs=1.6865234375, mean_rel=0.1485643982887268, max_rel=21.552417755126953, norm_rel=0.021093914285302162, ref_abs_avg=16.377769470214844, test_abs_avg=16.386823654174805
production_forward2 grad[87] vs paper_forward: mean_abs=0.4523460268974304, max_abs=4.5, mean_rel=0.1426170915365219, max_rel=656.35205078125, norm_rel=0.022391146048903465, ref_abs_avg=20.277992248535156, test_abs_avg=20.27874755859375
production_forward2 grad[88] vs paper_forward: mean_abs=0.40552470088005066, max_abs=3.875, mean_rel=0.16723737120628357, max_rel=937.4999389648438, norm_rel=0.020289989188313484, ref_abs_avg=20.022659301757812, test_abs_avg=20.011470794677734
production_forward2 grad[89] vs paper_forward: mean_abs=0.3313232660293579, max_abs=1.75, mean_rel=0.0858829915523529, max_rel=7.698469161987305, norm_rel=0.0211687833070755, ref_abs_avg=15.651432037353516, test_abs_avg=15.638036727905273
production_forward2 grad[90] vs paper_forward: mean_abs=0.4219425916671753, max_abs=4.5, mean_rel=0.13426285982131958, max_rel=553.794921875, norm_rel=0.021817976608872414, ref_abs_avg=19.513965606689453, test_abs_avg=19.513816833496094
production_forward2 grad[91] vs paper_forward: mean_abs=0.38703733682632446, max_abs=4.5, mean_rel=0.1937350332736969, max_rel=1937.4998779296875, norm_rel=0.02000710368156433, ref_abs_avg=19.365272521972656, test_abs_avg=19.369659423828125
production_forward2 grad[92] vs paper_forward: mean_abs=0.3249976336956024, max_abs=1.0625, mean_rel=0.09672161191701889, max_rel=8.182908058166504, norm_rel=0.02060714177787304, ref_abs_avg=15.290216445922852, test_abs_avg=15.283788681030273
production_forward2 grad[93] vs paper_forward: mean_abs=0.3909877836704254, max_abs=4.5, mean_rel=0.1311953067779541, max_rel=730.3274536132812, norm_rel=0.021528074517846107, ref_abs_avg=18.39264678955078, test_abs_avg=18.39272689819336
production_forward2 grad[94] vs paper_forward: mean_abs=0.35511428117752075, max_abs=3.5, mean_rel=0.17273637652397156, max_rel=1437.4998779296875, norm_rel=0.019377512857317924, ref_abs_avg=18.51291275024414, test_abs_avg=18.518827438354492
production_forward2 grad[95] vs paper_forward: mean_abs=0.3013296127319336, max_abs=1.375, mean_rel=0.11043326556682587, max_rel=10.363049507141113, norm_rel=0.019898580387234688, ref_abs_avg=15.04410457611084, test_abs_avg=15.026397705078125
production_forward2 grad[96] vs paper_forward: mean_abs=0.37654250860214233, max_abs=4.75, mean_rel=0.12161329388618469, max_rel=1022.5014038085938, norm_rel=0.020785322412848473, ref_abs_avg=18.41585350036621, test_abs_avg=18.415538787841797
production_forward2 grad[97] vs paper_forward: mean_abs=0.3332233130931854, max_abs=3.75, mean_rel=0.15823085606098175, max_rel=937.4999389648438, norm_rel=0.019204720854759216, ref_abs_avg=17.740827560424805, test_abs_avg=17.740707397460938
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  165.957 ms
torch_compile_phases_forward bwd-only: 132.575 ms
torch_compile_phases_forward peak allocated: fwd=13.078 GiB, fwd+bwd=13.706 GiB
torch_compile_phases_forward peak reserved:  fwd=13.375 GiB, fwd+bwd=17.627 GiB
production_forward fwd+bwd:  116.329 ms
production_forward bwd-only: 95.958 ms
production_forward peak allocated: fwd=3.368 GiB, fwd+bwd=10.493 GiB
production_forward peak reserved:  fwd=3.596 GiB, fwd+bwd=11.596 GiB
production_forward2 fwd+bwd:  191.461 ms
production_forward2 bwd-only: 172.310 ms
production_forward2 peak allocated: fwd=2.864 GiB, fwd+bwd=6.243 GiB
production_forward2 peak reserved:  fwd=3.221 GiB, fwd+bwd=8.971 GiB
paper_forward fwd+bwd:  384.347 ms
paper_forward bwd-only: 303.887 ms
paper_forward peak allocated: fwd=30.003 GiB, fwd+bwd=32.122 GiB
paper_forward peak reserved:  fwd=30.018 GiB, fwd+bwd=32.768 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016217887168750167, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.00814913772046566, max_abs=0.421875, mean_rel=0.07152241468429565, max_rel=111.93756103515625, norm_rel=0.019507914781570435, ref_abs_avg=0.4524350166320801, test_abs_avg=0.45244038105010986
production_forward grad[1] vs paper_forward: mean_abs=7.099387168884277, max_abs=77.75, mean_rel=0.15709438920021057, max_rel=191.6571502685547, norm_rel=0.019684532657265663, ref_abs_avg=314.0838928222656, test_abs_avg=314.16436767578125
production_forward grad[2] vs paper_forward: mean_abs=1.2779622077941895, max_abs=4.59765625, mean_rel=0.08800186961889267, max_rel=4.335578918457031, norm_rel=0.024895545095205307, ref_abs_avg=51.695945739746094, test_abs_avg=51.65388488769531
production_forward grad[3] vs paper_forward: mean_abs=1.5662081241607666, max_abs=12.0, mean_rel=0.16535504162311554, max_rel=1249.3475341796875, norm_rel=0.024345967918634415, ref_abs_avg=64.81581115722656, test_abs_avg=64.81997680664062
production_forward grad[4] vs paper_forward: mean_abs=1.4441213607788086, max_abs=9.5, mean_rel=0.38134488463401794, max_rel=3062.499755859375, norm_rel=0.022709820419549942, ref_abs_avg=63.85792541503906, test_abs_avg=63.856754302978516
production_forward grad[5] vs paper_forward: mean_abs=1.0207209587097168, max_abs=3.5, mean_rel=0.11318176984786987, max_rel=26.24397087097168, norm_rel=0.021961942315101624, ref_abs_avg=46.34657287597656, test_abs_avg=46.394935607910156
production_forward grad[6] vs paper_forward: mean_abs=1.3638474941253662, max_abs=9.0, mean_rel=0.1649923324584961, max_rel=2495.5400390625, norm_rel=0.02417777292430401, ref_abs_avg=56.77775192260742, test_abs_avg=56.77770233154297
production_forward grad[7] vs paper_forward: mean_abs=1.2575068473815918, max_abs=8.21875, mean_rel=0.340381920337677, max_rel=3843.749755859375, norm_rel=0.02254139631986618, ref_abs_avg=56.00895690917969, test_abs_avg=56.00885772705078
production_forward grad[8] vs paper_forward: mean_abs=0.9331464767456055, max_abs=3.5, mean_rel=0.15222308039665222, max_rel=37.71989440917969, norm_rel=0.022932378575205803, ref_abs_avg=41.21363067626953, test_abs_avg=41.210853576660156
production_forward grad[9] vs paper_forward: mean_abs=1.228217363357544, max_abs=9.0, mean_rel=0.16999474167823792, max_rel=2412.0478515625, norm_rel=0.02389819175004959, ref_abs_avg=51.71076965332031, test_abs_avg=51.715972900390625
production_forward grad[10] vs paper_forward: mean_abs=1.1278865337371826, max_abs=7.0, mean_rel=0.308915376663208, max_rel=3593.749755859375, norm_rel=0.02227029763162136, ref_abs_avg=50.914794921875, test_abs_avg=50.90909194946289
production_forward grad[11] vs paper_forward: mean_abs=0.8748102188110352, max_abs=3.3125, mean_rel=0.14189492166042328, max_rel=19.45480728149414, norm_rel=0.023790499195456505, ref_abs_avg=37.534263610839844, test_abs_avg=37.45611572265625
production_forward grad[12] vs paper_forward: mean_abs=1.1395208835601807, max_abs=8.0, mean_rel=0.15384028851985931, max_rel=1056.873779296875, norm_rel=0.023788366466760635, ref_abs_avg=48.20590591430664, test_abs_avg=48.20610046386719
production_forward grad[13] vs paper_forward: mean_abs=1.0473569631576538, max_abs=6.203125, mean_rel=0.32093507051467896, max_rel=3562.499755859375, norm_rel=0.022129883989691734, ref_abs_avg=47.61308288574219, test_abs_avg=47.611026763916016
production_forward grad[14] vs paper_forward: mean_abs=0.8176890015602112, max_abs=3.5, mean_rel=0.5342005491256714, max_rel=226.92404174804688, norm_rel=0.021515721455216408, ref_abs_avg=38.398475646972656, test_abs_avg=38.37250518798828
production_forward grad[15] vs paper_forward: mean_abs=1.0722815990447998, max_abs=9.0, mean_rel=0.14673787355422974, max_rel=984.0009155273438, norm_rel=0.023628296330571175, ref_abs_avg=45.661468505859375, test_abs_avg=45.66161346435547
production_forward grad[16] vs paper_forward: mean_abs=0.9853565096855164, max_abs=6.625, mean_rel=0.31437137722969055, max_rel=4218.75, norm_rel=0.02194899134337902, ref_abs_avg=45.16523361206055, test_abs_avg=45.16271209716797
production_forward grad[17] vs paper_forward: mean_abs=0.8344449996948242, max_abs=3.125, mean_rel=0.1013169214129448, max_rel=14.600631713867188, norm_rel=0.022386305034160614, ref_abs_avg=36.58955001831055, test_abs_avg=36.547386169433594
production_forward grad[18] vs paper_forward: mean_abs=1.0094815492630005, max_abs=7.25, mean_rel=0.16057509183883667, max_rel=1304.5274658203125, norm_rel=0.023588601499795914, ref_abs_avg=43.10002517700195, test_abs_avg=43.09928512573242
production_forward grad[19] vs paper_forward: mean_abs=0.9250478744506836, max_abs=5.625, mean_rel=0.2867133617401123, max_rel=3499.999755859375, norm_rel=0.021724015474319458, ref_abs_avg=42.7528076171875, test_abs_avg=42.753211975097656
production_forward grad[20] vs paper_forward: mean_abs=0.7410778999328613, max_abs=2.9140625, mean_rel=0.09041954576969147, max_rel=4.362713813781738, norm_rel=0.022355124354362488, ref_abs_avg=33.62725067138672, test_abs_avg=33.69078063964844
production_forward grad[21] vs paper_forward: mean_abs=0.9611794352531433, max_abs=7.0, mean_rel=0.1486080437898636, max_rel=2359.229736328125, norm_rel=0.023418297991156578, ref_abs_avg=41.320228576660156, test_abs_avg=41.31814193725586
production_forward grad[22] vs paper_forward: mean_abs=0.878294050693512, max_abs=5.25, mean_rel=0.2980402112007141, max_rel=2390.625, norm_rel=0.02161983959376812, ref_abs_avg=40.88514709472656, test_abs_avg=40.88079833984375
production_forward grad[23] vs paper_forward: mean_abs=0.6745256185531616, max_abs=3.0625, mean_rel=1.0994153022766113, max_rel=531.802001953125, norm_rel=0.0217090155929327, ref_abs_avg=32.42936325073242, test_abs_avg=32.41905212402344
production_forward grad[24] vs paper_forward: mean_abs=0.9149534702301025, max_abs=8.0, mean_rel=0.15118394792079926, max_rel=1659.9473876953125, norm_rel=0.02332734316587448, ref_abs_avg=39.494239807128906, test_abs_avg=39.494483947753906
production_forward grad[25] vs paper_forward: mean_abs=0.8380016088485718, max_abs=5.5, mean_rel=0.2770633101463318, max_rel=2031.2498779296875, norm_rel=0.021412964910268784, ref_abs_avg=39.384620666503906, test_abs_avg=39.38482666015625
production_forward grad[26] vs paper_forward: mean_abs=0.829132080078125, max_abs=3.0, mean_rel=0.11243874579668045, max_rel=19.519634246826172, norm_rel=0.022797400131821632, ref_abs_avg=36.23738479614258, test_abs_avg=36.213802337646484
production_forward grad[27] vs paper_forward: mean_abs=1.0585182905197144, max_abs=7.5, mean_rel=0.16862967610359192, max_rel=1895.67724609375, norm_rel=0.025202477350831032, ref_abs_avg=42.27507019042969, test_abs_avg=42.27607727050781
production_forward grad[28] vs paper_forward: mean_abs=0.9766052961349487, max_abs=6.0, mean_rel=0.36472052335739136, max_rel=3343.749755859375, norm_rel=0.023456813767552376, ref_abs_avg=41.825439453125, test_abs_avg=41.82497024536133
production_forward grad[29] vs paper_forward: mean_abs=0.7663419246673584, max_abs=2.625, mean_rel=0.3443754315376282, max_rel=42.42625427246094, norm_rel=0.02432161569595337, ref_abs_avg=31.108558654785156, test_abs_avg=31.126508712768555
production_forward grad[30] vs paper_forward: mean_abs=0.9785837531089783, max_abs=7.5, mean_rel=0.16818425059318542, max_rel=1794.621826171875, norm_rel=0.02547857165336609, ref_abs_avg=38.608055114746094, test_abs_avg=38.60634994506836
production_forward grad[31] vs paper_forward: mean_abs=0.9177613854408264, max_abs=6.75, mean_rel=0.3277144432067871, max_rel=4343.75, norm_rel=0.02409685216844082, ref_abs_avg=38.167057037353516, test_abs_avg=38.160423278808594
production_forward grad[32] vs paper_forward: mean_abs=0.7409960031509399, max_abs=2.90625, mean_rel=0.31822657585144043, max_rel=38.30865478515625, norm_rel=0.02351268008351326, ref_abs_avg=31.432157516479492, test_abs_avg=31.437095642089844
production_forward grad[33] vs paper_forward: mean_abs=0.9123128652572632, max_abs=6.0, mean_rel=0.16722355782985687, max_rel=1018.319091796875, norm_rel=0.025181610137224197, ref_abs_avg=36.377220153808594, test_abs_avg=36.37696838378906
production_forward grad[34] vs paper_forward: mean_abs=0.8484143614768982, max_abs=5.4375, mean_rel=0.296663761138916, max_rel=2468.75, norm_rel=0.02362559363245964, ref_abs_avg=35.991676330566406, test_abs_avg=35.99531555175781
production_forward grad[35] vs paper_forward: mean_abs=0.6726857423782349, max_abs=2.5, mean_rel=0.1925402134656906, max_rel=41.71101760864258, norm_rel=0.024452613666653633, ref_abs_avg=26.90456199645996, test_abs_avg=26.92673683166504
production_forward grad[36] vs paper_forward: mean_abs=0.8546080589294434, max_abs=6.0, mean_rel=0.17260012030601501, max_rel=1727.7989501953125, norm_rel=0.024966975674033165, ref_abs_avg=34.353328704833984, test_abs_avg=34.35035705566406
production_forward grad[37] vs paper_forward: mean_abs=0.8046363592147827, max_abs=5.375, mean_rel=0.3064156770706177, max_rel=2749.999755859375, norm_rel=0.023789118975400925, ref_abs_avg=33.95781707763672, test_abs_avg=33.9594612121582
production_forward grad[38] vs paper_forward: mean_abs=0.6307029724121094, max_abs=2.25, mean_rel=0.08139456808567047, max_rel=2.6326472759246826, norm_rel=0.022689973935484886, ref_abs_avg=27.78165054321289, test_abs_avg=27.821334838867188
production_forward grad[39] vs paper_forward: mean_abs=0.8041412830352783, max_abs=5.5, mean_rel=0.15902400016784668, max_rel=2000.5111083984375, norm_rel=0.02485661208629608, ref_abs_avg=32.466102600097656, test_abs_avg=32.468074798583984
production_forward grad[40] vs paper_forward: mean_abs=0.7511862516403198, max_abs=4.875, mean_rel=0.2855498194694519, max_rel=3031.249755859375, norm_rel=0.023262765258550644, ref_abs_avg=32.355186462402344, test_abs_avg=32.352909088134766
production_forward grad[41] vs paper_forward: mean_abs=0.5923879146575928, max_abs=2.25, mean_rel=0.09733831882476807, max_rel=7.6991963386535645, norm_rel=0.023704083636403084, ref_abs_avg=25.678850173950195, test_abs_avg=25.769014358520508
production_forward grad[42] vs paper_forward: mean_abs=0.7634930610656738, max_abs=5.5, mean_rel=0.15869569778442383, max_rel=793.5921020507812, norm_rel=0.0244857519865036, ref_abs_avg=31.29768180847168, test_abs_avg=31.295703887939453
production_forward grad[43] vs paper_forward: mean_abs=0.705115795135498, max_abs=4.0, mean_rel=0.25487712025642395, max_rel=2624.999755859375, norm_rel=0.022911904379725456, ref_abs_avg=30.82806396484375, test_abs_avg=30.832046508789062
production_forward grad[44] vs paper_forward: mean_abs=0.5723237991333008, max_abs=2.25, mean_rel=0.0845445990562439, max_rel=6.2786664962768555, norm_rel=0.02312231995165348, ref_abs_avg=24.554292678833008, test_abs_avg=24.57278060913086
production_forward grad[45] vs paper_forward: mean_abs=0.727101743221283, max_abs=6.0, mean_rel=0.16015729308128357, max_rel=914.77001953125, norm_rel=0.02431141398847103, ref_abs_avg=30.016223907470703, test_abs_avg=30.01399040222168
production_forward grad[46] vs paper_forward: mean_abs=0.6738213300704956, max_abs=4.25, mean_rel=0.29492640495300293, max_rel=1499.9998779296875, norm_rel=0.022968817502260208, ref_abs_avg=29.46951675415039, test_abs_avg=29.471162796020508
production_forward grad[47] vs paper_forward: mean_abs=0.5337905883789062, max_abs=2.25, mean_rel=0.09539161622524261, max_rel=5.986089706420898, norm_rel=0.023160871118307114, ref_abs_avg=23.75469970703125, test_abs_avg=23.702991485595703
production_forward grad[48] vs paper_forward: mean_abs=0.6930289268493652, max_abs=6.0, mean_rel=0.1542789191007614, max_rel=1285.558837890625, norm_rel=0.024265291169285774, ref_abs_avg=28.6768798828125, test_abs_avg=28.674827575683594
production_forward grad[49] vs paper_forward: mean_abs=0.6451272368431091, max_abs=4.0, mean_rel=0.2516384720802307, max_rel=1906.2498779296875, norm_rel=0.02266654185950756, ref_abs_avg=28.48326873779297, test_abs_avg=28.485183715820312
production_forward grad[50] vs paper_forward: mean_abs=0.6192336678504944, max_abs=2.75, mean_rel=0.10106727480888367, max_rel=8.976168632507324, norm_rel=0.024874690920114517, ref_abs_avg=25.58939552307129, test_abs_avg=25.617534637451172
production_forward grad[51] vs paper_forward: mean_abs=0.776975154876709, max_abs=6.0, mean_rel=0.16745439171791077, max_rel=1350.948974609375, norm_rel=0.025188561528921127, ref_abs_avg=30.96274185180664, test_abs_avg=30.963069915771484
production_forward grad[52] vs paper_forward: mean_abs=0.7256664633750916, max_abs=4.75, mean_rel=0.29730498790740967, max_rel=1999.9998779296875, norm_rel=0.024000948294997215, ref_abs_avg=30.39059066772461, test_abs_avg=30.39298439025879
production_forward grad[53] vs paper_forward: mean_abs=0.5687747001647949, max_abs=2.0, mean_rel=0.09234754741191864, max_rel=5.982662677764893, norm_rel=0.024125399067997932, ref_abs_avg=24.05785369873047, test_abs_avg=24.089614868164062
production_forward grad[54] vs paper_forward: mean_abs=0.7234959602355957, max_abs=5.25, mean_rel=0.1580907255411148, max_rel=957.6349487304688, norm_rel=0.024968694895505905, ref_abs_avg=29.054903030395508, test_abs_avg=29.053607940673828
production_forward grad[55] vs paper_forward: mean_abs=0.6662845015525818, max_abs=4.3125, mean_rel=0.2946946322917938, max_rel=2937.499755859375, norm_rel=0.02346036769449711, ref_abs_avg=28.4444522857666, test_abs_avg=28.44991683959961
production_forward grad[56] vs paper_forward: mean_abs=0.490053653717041, max_abs=2.51953125, mean_rel=0.09568548202514648, max_rel=4.769379615783691, norm_rel=0.022258572280406952, ref_abs_avg=21.982091903686523, test_abs_avg=21.976356506347656
production_forward grad[57] vs paper_forward: mean_abs=0.6670671105384827, max_abs=4.75, mean_rel=0.1598357856273651, max_rel=728.043701171875, norm_rel=0.02447471208870411, ref_abs_avg=27.327598571777344, test_abs_avg=27.325408935546875
production_forward grad[58] vs paper_forward: mean_abs=0.624226450920105, max_abs=5.0, mean_rel=0.2583187520503998, max_rel=2062.5, norm_rel=0.023146258667111397, ref_abs_avg=27.056968688964844, test_abs_avg=27.059057235717773
production_forward grad[59] vs paper_forward: mean_abs=0.4941157102584839, max_abs=1.75, mean_rel=0.10197319835424423, max_rel=9.174827575683594, norm_rel=0.022208046168088913, ref_abs_avg=22.12683868408203, test_abs_avg=22.128055572509766
production_forward grad[60] vs paper_forward: mean_abs=0.6301783919334412, max_abs=5.0, mean_rel=0.14338305592536926, max_rel=1166.49365234375, norm_rel=0.023941360414028168, ref_abs_avg=26.361270904541016, test_abs_avg=26.361583709716797
production_forward grad[61] vs paper_forward: mean_abs=0.5808776617050171, max_abs=4.0, mean_rel=0.2064710557460785, max_rel=2500.0, norm_rel=0.022602621465921402, ref_abs_avg=25.74637222290039, test_abs_avg=25.737768173217773
production_forward grad[62] vs paper_forward: mean_abs=0.45759618282318115, max_abs=2.125, mean_rel=0.07526277005672455, max_rel=4.882099151611328, norm_rel=0.02158961445093155, ref_abs_avg=21.451648712158203, test_abs_avg=21.482351303100586
production_forward grad[63] vs paper_forward: mean_abs=0.5932465195655823, max_abs=5.0, mean_rel=0.16034433245658875, max_rel=1145.9248046875, norm_rel=0.02372090145945549, ref_abs_avg=25.05724334716797, test_abs_avg=25.056053161621094
production_forward grad[64] vs paper_forward: mean_abs=0.5409322381019592, max_abs=4.0, mean_rel=0.24887020885944366, max_rel=2156.25, norm_rel=0.022175688296556473, ref_abs_avg=24.444364547729492, test_abs_avg=24.44702911376953
production_forward grad[65] vs paper_forward: mean_abs=0.4384329319000244, max_abs=1.859375, mean_rel=0.13420450687408447, max_rel=18.062084197998047, norm_rel=0.020897364243865013, ref_abs_avg=20.376632690429688, test_abs_avg=20.382186889648438
production_forward grad[66] vs paper_forward: mean_abs=0.5660401582717896, max_abs=4.5, mean_rel=0.14653395116329193, max_rel=903.5748291015625, norm_rel=0.023194806650280952, ref_abs_avg=24.439380645751953, test_abs_avg=24.43995475769043
production_forward grad[67] vs paper_forward: mean_abs=0.512433648109436, max_abs=4.0, mean_rel=0.2213689088821411, max_rel=1531.2498779296875, norm_rel=0.02145254984498024, ref_abs_avg=23.885929107666016, test_abs_avg=23.888084411621094
production_forward grad[68] vs paper_forward: mean_abs=0.41551637649536133, max_abs=1.78125, mean_rel=0.08495993912220001, max_rel=5.350304126739502, norm_rel=0.023709731176495552, ref_abs_avg=18.075881958007812, test_abs_avg=18.116668701171875
production_forward grad[69] vs paper_forward: mean_abs=0.5321187973022461, max_abs=4.0, mean_rel=0.14555761218070984, max_rel=1036.135498046875, norm_rel=0.02295183762907982, ref_abs_avg=23.23097801208496, test_abs_avg=23.23067283630371
production_forward grad[70] vs paper_forward: mean_abs=0.4945070743560791, max_abs=3.25, mean_rel=0.1921089142560959, max_rel=1109.375, norm_rel=0.021386489272117615, ref_abs_avg=23.151782989501953, test_abs_avg=23.153682708740234
production_forward grad[71] vs paper_forward: mean_abs=0.4121372699737549, max_abs=1.560546875, mean_rel=0.08784694969654083, max_rel=7.870990753173828, norm_rel=0.022135969251394272, ref_abs_avg=18.986316680908203, test_abs_avg=18.97992515563965
production_forward grad[72] vs paper_forward: mean_abs=0.5161213278770447, max_abs=5.0, mean_rel=0.1407782882452011, max_rel=845.9219970703125, norm_rel=0.022570425644516945, ref_abs_avg=22.880176544189453, test_abs_avg=22.878158569335938
production_forward grad[73] vs paper_forward: mean_abs=0.4766884446144104, max_abs=4.0, mean_rel=0.17601922154426575, max_rel=1421.8748779296875, norm_rel=0.021299269050359726, ref_abs_avg=22.46895980834961, test_abs_avg=22.471237182617188
production_forward grad[74] vs paper_forward: mean_abs=0.4326965808868408, max_abs=1.359375, mean_rel=0.09189224243164062, max_rel=13.238415718078613, norm_rel=0.022763393819332123, ref_abs_avg=18.983806610107422, test_abs_avg=18.99970245361328
production_forward grad[75] vs paper_forward: mean_abs=0.5418070554733276, max_abs=4.5, mean_rel=0.14760129153728485, max_rel=1098.39404296875, norm_rel=0.024243535473942757, ref_abs_avg=22.401782989501953, test_abs_avg=22.39908218383789
production_forward grad[76] vs paper_forward: mean_abs=0.5006730556488037, max_abs=3.5, mean_rel=0.2194749116897583, max_rel=1624.9998779296875, norm_rel=0.02248486317694187, ref_abs_avg=22.303537368774414, test_abs_avg=22.305103302001953
production_forward grad[77] vs paper_forward: mean_abs=0.39717817306518555, max_abs=2.0625, mean_rel=0.09106438606977463, max_rel=11.729243278503418, norm_rel=0.023598743602633476, ref_abs_avg=17.021465301513672, test_abs_avg=17.026180267333984
production_forward grad[78] vs paper_forward: mean_abs=0.5056310296058655, max_abs=4.5, mean_rel=0.14816993474960327, max_rel=902.3272094726562, norm_rel=0.023867372423410416, ref_abs_avg=21.223621368408203, test_abs_avg=21.222959518432617
production_forward grad[79] vs paper_forward: mean_abs=0.4678857624530792, max_abs=4.25, mean_rel=0.22891050577163696, max_rel=2156.25, norm_rel=0.022699298337101936, ref_abs_avg=20.673660278320312, test_abs_avg=20.672626495361328
production_forward grad[80] vs paper_forward: mean_abs=0.36754369735717773, max_abs=1.296875, mean_rel=0.09934179484844208, max_rel=5.287904739379883, norm_rel=0.02218606509268284, ref_abs_avg=16.05415153503418, test_abs_avg=16.029277801513672
production_forward grad[81] vs paper_forward: mean_abs=0.46653011441230774, max_abs=4.5, mean_rel=0.15244662761688232, max_rel=962.7755126953125, norm_rel=0.02326507866382599, ref_abs_avg=20.118576049804688, test_abs_avg=20.118099212646484
production_forward grad[82] vs paper_forward: mean_abs=0.43613284826278687, max_abs=3.578125, mean_rel=0.2086217701435089, max_rel=1499.9998779296875, norm_rel=0.022014794871211052, ref_abs_avg=19.930217742919922, test_abs_avg=19.94134521484375
production_forward grad[83] vs paper_forward: mean_abs=0.3263424038887024, max_abs=1.375, mean_rel=0.25503432750701904, max_rel=93.0511703491211, norm_rel=0.020389514043927193, ref_abs_avg=16.328575134277344, test_abs_avg=16.35116195678711
production_forward grad[84] vs paper_forward: mean_abs=0.4424129128456116, max_abs=5.5, mean_rel=0.13952383399009705, max_rel=1051.17578125, norm_rel=0.02266653999686241, ref_abs_avg=19.60647201538086, test_abs_avg=19.60487174987793
production_forward grad[85] vs paper_forward: mean_abs=0.4042922854423523, max_abs=3.59375, mean_rel=0.19292788207530975, max_rel=718.7499389648438, norm_rel=0.021684663370251656, ref_abs_avg=18.807056427001953, test_abs_avg=18.810047149658203
production_forward grad[86] vs paper_forward: mean_abs=0.31888842582702637, max_abs=1.0625, mean_rel=0.08547858893871307, max_rel=14.45501708984375, norm_rel=0.02041471004486084, ref_abs_avg=15.391060829162598, test_abs_avg=15.353580474853516
production_forward grad[87] vs paper_forward: mean_abs=0.4113520085811615, max_abs=5.75, mean_rel=0.13377916812896729, max_rel=709.3445434570312, norm_rel=0.022022271528840065, ref_abs_avg=18.79989242553711, test_abs_avg=18.798675537109375
production_forward grad[88] vs paper_forward: mean_abs=0.37185004353523254, max_abs=3.25, mean_rel=0.16736164689064026, max_rel=1187.5, norm_rel=0.020019395276904106, ref_abs_avg=18.581233978271484, test_abs_avg=18.581104278564453
production_forward grad[89] vs paper_forward: mean_abs=0.30770206451416016, max_abs=1.2919921875, mean_rel=0.11178940534591675, max_rel=7.107675552368164, norm_rel=0.020334986969828606, ref_abs_avg=15.649442672729492, test_abs_avg=15.631224632263184
production_forward grad[90] vs paper_forward: mean_abs=0.39767202734947205, max_abs=3.875, mean_rel=0.1385965496301651, max_rel=1189.4267578125, norm_rel=0.021844683215022087, ref_abs_avg=18.38957977294922, test_abs_avg=18.390045166015625
production_forward grad[91] vs paper_forward: mean_abs=0.3592665195465088, max_abs=3.25, mean_rel=0.16265617311000824, max_rel=1125.0, norm_rel=0.01986437477171421, ref_abs_avg=18.13667869567871, test_abs_avg=18.139883041381836
production_forward grad[92] vs paper_forward: mean_abs=0.2845492362976074, max_abs=1.5, mean_rel=0.11679625511169434, max_rel=14.35117244720459, norm_rel=0.019589565694332123, ref_abs_avg=14.859167098999023, test_abs_avg=14.86878490447998
production_forward grad[93] vs paper_forward: mean_abs=0.36749887466430664, max_abs=4.0, mean_rel=0.12817847728729248, max_rel=742.482666015625, norm_rel=0.020969849079847336, ref_abs_avg=17.729942321777344, test_abs_avg=17.73065948486328
production_forward grad[94] vs paper_forward: mean_abs=0.3309711217880249, max_abs=3.125, mean_rel=0.16892024874687195, max_rel=1218.75, norm_rel=0.019090132787823677, ref_abs_avg=17.540678024291992, test_abs_avg=17.545063018798828
production_forward grad[95] vs paper_forward: mean_abs=0.2761402130126953, max_abs=1.125, mean_rel=0.08443284034729004, max_rel=8.630289077758789, norm_rel=0.021002130582928658, ref_abs_avg=13.244653701782227, test_abs_avg=13.24276351928711
production_forward grad[96] vs paper_forward: mean_abs=0.34899818897247314, max_abs=4.0, mean_rel=0.11943434178829193, max_rel=505.67230224609375, norm_rel=0.020897749811410904, ref_abs_avg=16.97920799255371, test_abs_avg=16.977466583251953
production_forward grad[97] vs paper_forward: mean_abs=0.3240957260131836, max_abs=3.75, mean_rel=0.17392054200172424, max_rel=1437.4998779296875, norm_rel=0.01954592950642109, ref_abs_avg=16.944499969482422, test_abs_avg=16.941612243652344
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016244162106886506, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008493075147271156, max_abs=0.421875, mean_rel=0.07415503263473511, max_rel=137.02963256835938, norm_rel=0.02020343393087387, ref_abs_avg=0.4524350166320801, test_abs_avg=0.45242631435394287
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.242634296417236, max_abs=80.0, mean_rel=0.14005106687545776, max_rel=172.02635192871094, norm_rel=0.020224545150995255, ref_abs_avg=314.0838928222656, test_abs_avg=314.123046875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2934150695800781, max_abs=4.5, mean_rel=0.10177043825387955, max_rel=5.7390031814575195, norm_rel=0.025321418419480324, ref_abs_avg=51.695945739746094, test_abs_avg=51.70363998413086
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6179726123809814, max_abs=12.0, mean_rel=0.168955996632576, max_rel=1244.267578125, norm_rel=0.025128092616796494, ref_abs_avg=64.81581115722656, test_abs_avg=64.8187255859375
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.497746229171753, max_abs=10.0, mean_rel=0.3550066351890564, max_rel=2500.0, norm_rel=0.0235458854585886, ref_abs_avg=63.85792541503906, test_abs_avg=63.852081298828125
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0925006866455078, max_abs=3.75, mean_rel=0.13243834674358368, max_rel=19.26250457763672, norm_rel=0.02305280603468418, ref_abs_avg=46.34657287597656, test_abs_avg=46.42959976196289
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.403890609741211, max_abs=10.0, mean_rel=0.1653965711593628, max_rel=2480.860107421875, norm_rel=0.024875035509467125, ref_abs_avg=56.77775192260742, test_abs_avg=56.775508880615234
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3013502359390259, max_abs=8.21875, mean_rel=0.357565701007843, max_rel=3609.374755859375, norm_rel=0.023331888020038605, ref_abs_avg=56.00895690917969, test_abs_avg=56.00465393066406
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0174517631530762, max_abs=3.75, mean_rel=0.18791842460632324, max_rel=43.62467956542969, norm_rel=0.0249395240098238, ref_abs_avg=41.21363067626953, test_abs_avg=41.24633026123047
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2642452716827393, max_abs=9.0, mean_rel=0.17765605449676514, max_rel=1898.5611572265625, norm_rel=0.02458101697266102, ref_abs_avg=51.71076965332031, test_abs_avg=51.715370178222656
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.167262077331543, max_abs=6.75, mean_rel=0.3385276198387146, max_rel=3968.749755859375, norm_rel=0.023039590567350388, ref_abs_avg=50.914794921875, test_abs_avg=50.90504837036133
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.8681674003601074, max_abs=3.5, mean_rel=0.11784464120864868, max_rel=12.183459281921387, norm_rel=0.023930547758936882, ref_abs_avg=37.534263610839844, test_abs_avg=37.461830139160156
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.167711853981018, max_abs=8.5, mean_rel=0.16297048330307007, max_rel=1495.2103271484375, norm_rel=0.024383816868066788, ref_abs_avg=48.20590591430664, test_abs_avg=48.20407485961914
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0774319171905518, max_abs=6.5, mean_rel=0.3048667311668396, max_rel=3437.499755859375, norm_rel=0.022731836885213852, ref_abs_avg=47.61308288574219, test_abs_avg=47.608741760253906
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8413398861885071, max_abs=3.75, mean_rel=0.4549209475517273, max_rel=187.27679443359375, norm_rel=0.022218385711312294, ref_abs_avg=38.398475646972656, test_abs_avg=38.34900665283203
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0981649160385132, max_abs=8.5, mean_rel=0.14931169152259827, max_rel=905.9107055664062, norm_rel=0.02419540286064148, ref_abs_avg=45.661468505859375, test_abs_avg=45.6595344543457
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0150585174560547, max_abs=6.75, mean_rel=0.29171156883239746, max_rel=3531.249755859375, norm_rel=0.02260040119290352, ref_abs_avg=45.16523361206055, test_abs_avg=45.16493606567383
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8283376693725586, max_abs=3.375, mean_rel=0.11945189535617828, max_rel=29.57694435119629, norm_rel=0.02284785360097885, ref_abs_avg=36.58955001831055, test_abs_avg=36.5894889831543
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0317282676696777, max_abs=7.0, mean_rel=0.16858306527137756, max_rel=1249.155517578125, norm_rel=0.024102898314595222, ref_abs_avg=43.10002517700195, test_abs_avg=43.096763610839844
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9513141512870789, max_abs=6.0, mean_rel=0.3156413435935974, max_rel=3671.874755859375, norm_rel=0.02232576720416546, ref_abs_avg=42.7528076171875, test_abs_avg=42.753074645996094
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7594871520996094, max_abs=3.0, mean_rel=0.09986592829227448, max_rel=9.569671630859375, norm_rel=0.022673122584819794, ref_abs_avg=33.62725067138672, test_abs_avg=33.68559265136719
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9825139045715332, max_abs=7.0, mean_rel=0.14948710799217224, max_rel=2200.21337890625, norm_rel=0.02393236570060253, ref_abs_avg=41.320228576660156, test_abs_avg=41.31771469116211
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8999601602554321, max_abs=5.5, mean_rel=0.33603155612945557, max_rel=3062.499755859375, norm_rel=0.022137247025966644, ref_abs_avg=40.88514709472656, test_abs_avg=40.8798828125
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7168015241622925, max_abs=3.5, mean_rel=1.0292121171951294, max_rel=493.2245788574219, norm_rel=0.022437434643507004, ref_abs_avg=32.42936325073242, test_abs_avg=32.42311096191406
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9324333071708679, max_abs=7.0, mean_rel=0.15732738375663757, max_rel=1640.4205322265625, norm_rel=0.023770080879330635, ref_abs_avg=39.494239807128906, test_abs_avg=39.49331283569336
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8564859628677368, max_abs=5.625, mean_rel=0.28242647647857666, max_rel=2203.125, norm_rel=0.02187228389084339, ref_abs_avg=39.384620666503906, test_abs_avg=39.383262634277344
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8436431884765625, max_abs=3.5, mean_rel=0.11249761283397675, max_rel=19.622167587280273, norm_rel=0.023600788787007332, ref_abs_avg=36.23738479614258, test_abs_avg=36.198123931884766
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0819240808486938, max_abs=7.578125, mean_rel=0.17162880301475525, max_rel=1339.2784423828125, norm_rel=0.025741474702954292, ref_abs_avg=42.27507019042969, test_abs_avg=42.274818420410156
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0024932622909546, max_abs=6.0, mean_rel=0.35525578260421753, max_rel=3124.999755859375, norm_rel=0.024064021185040474, ref_abs_avg=41.825439453125, test_abs_avg=41.82297897338867
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7919001579284668, max_abs=2.875, mean_rel=0.26777157187461853, max_rel=27.72892189025879, norm_rel=0.024989094585180283, ref_abs_avg=31.108558654785156, test_abs_avg=31.106229782104492
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9981191754341125, max_abs=6.5, mean_rel=0.17258310317993164, max_rel=3038.23046875, norm_rel=0.02598528005182743, ref_abs_avg=38.608055114746094, test_abs_avg=38.60529327392578
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9397040605545044, max_abs=6.5, mean_rel=0.3228048086166382, max_rel=3437.499755859375, norm_rel=0.024684559553861618, ref_abs_avg=38.167057037353516, test_abs_avg=38.162193298339844
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7552541494369507, max_abs=3.25, mean_rel=0.2639979124069214, max_rel=41.25999450683594, norm_rel=0.023934535682201385, ref_abs_avg=31.432157516479492, test_abs_avg=31.427988052368164
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9289426803588867, max_abs=6.0, mean_rel=0.16607366502285004, max_rel=1342.066650390625, norm_rel=0.025642860680818558, ref_abs_avg=36.377220153808594, test_abs_avg=36.37583923339844
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8670074343681335, max_abs=5.25, mean_rel=0.31709033250808716, max_rel=2562.5, norm_rel=0.02414620853960514, ref_abs_avg=35.991676330566406, test_abs_avg=35.994293212890625
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6988859176635742, max_abs=3.0, mean_rel=0.11617603898048401, max_rel=6.896398544311523, norm_rel=0.02566535770893097, ref_abs_avg=26.90456199645996, test_abs_avg=26.90636444091797
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.869240403175354, max_abs=6.0, mean_rel=0.17205220460891724, max_rel=2081.288330078125, norm_rel=0.025388887152075768, ref_abs_avg=34.353328704833984, test_abs_avg=34.35010528564453
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8197699785232544, max_abs=5.0, mean_rel=0.31380021572113037, max_rel=2859.374755859375, norm_rel=0.024211321026086807, ref_abs_avg=33.95781707763672, test_abs_avg=33.96101379394531
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6658973693847656, max_abs=2.75, mean_rel=0.09153243899345398, max_rel=3.969865083694458, norm_rel=0.024082202464342117, ref_abs_avg=27.78165054321289, test_abs_avg=27.829261779785156
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8170617818832397, max_abs=5.5, mean_rel=0.16009947657585144, max_rel=1587.1024169921875, norm_rel=0.02523757331073284, ref_abs_avg=32.466102600097656, test_abs_avg=32.46750259399414
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.762362003326416, max_abs=5.0, mean_rel=0.26242581009864807, max_rel=2687.499755859375, norm_rel=0.023612894117832184, ref_abs_avg=32.355186462402344, test_abs_avg=32.35483932495117
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6209850311279297, max_abs=2.375, mean_rel=0.09378352016210556, max_rel=6.308037281036377, norm_rel=0.024382231757044792, ref_abs_avg=25.678850173950195, test_abs_avg=25.78180694580078
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7752072215080261, max_abs=6.0, mean_rel=0.16204750537872314, max_rel=1266.875732421875, norm_rel=0.02485405094921589, ref_abs_avg=31.29768180847168, test_abs_avg=31.295448303222656
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7168533802032471, max_abs=4.375, mean_rel=0.25436264276504517, max_rel=2562.5, norm_rel=0.02329237014055252, ref_abs_avg=30.82806396484375, test_abs_avg=30.827011108398438
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5571451187133789, max_abs=2.0, mean_rel=0.07979539036750793, max_rel=4.425856590270996, norm_rel=0.022986091673374176, ref_abs_avg=24.554292678833008, test_abs_avg=24.585487365722656
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7368226051330566, max_abs=5.0, mean_rel=0.1632462739944458, max_rel=923.281494140625, norm_rel=0.024639219045639038, ref_abs_avg=30.016223907470703, test_abs_avg=30.013826370239258
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6841023564338684, max_abs=4.28125, mean_rel=0.2966327667236328, max_rel=1812.4998779296875, norm_rel=0.0233065914362669, ref_abs_avg=29.46951675415039, test_abs_avg=29.470645904541016
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5371833443641663, max_abs=2.15625, mean_rel=0.10164736211299896, max_rel=8.300004005432129, norm_rel=0.023307638242840767, ref_abs_avg=23.75469970703125, test_abs_avg=23.70758819580078
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7022628784179688, max_abs=7.0, mean_rel=0.15786761045455933, max_rel=1426.5498046875, norm_rel=0.02458089590072632, ref_abs_avg=28.6768798828125, test_abs_avg=28.673297882080078
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6562419533729553, max_abs=4.375, mean_rel=0.26106756925582886, max_rel=2125.0, norm_rel=0.023070337250828743, ref_abs_avg=28.48326873779297, test_abs_avg=28.48261260986328
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6416540145874023, max_abs=2.25, mean_rel=0.08585907518863678, max_rel=4.8978962898254395, norm_rel=0.025737816467881203, ref_abs_avg=25.58939552307129, test_abs_avg=25.622966766357422
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7884730696678162, max_abs=5.5, mean_rel=0.17434294521808624, max_rel=1882.5059814453125, norm_rel=0.025558611378073692, ref_abs_avg=30.96274185180664, test_abs_avg=30.959842681884766
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7370893955230713, max_abs=4.5, mean_rel=0.30508890748023987, max_rel=2500.0, norm_rel=0.024375615641474724, ref_abs_avg=30.39059066772461, test_abs_avg=30.391767501831055
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5680389404296875, max_abs=2.0, mean_rel=0.08754892647266388, max_rel=5.220708847045898, norm_rel=0.024186130613088608, ref_abs_avg=24.05785369873047, test_abs_avg=24.11440658569336
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.733116626739502, max_abs=5.0, mean_rel=0.15914461016654968, max_rel=836.4889526367188, norm_rel=0.02528795786201954, ref_abs_avg=29.054903030395508, test_abs_avg=29.053333282470703
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6780517101287842, max_abs=4.28125, mean_rel=0.2943648397922516, max_rel=2375.0, norm_rel=0.023869531229138374, ref_abs_avg=28.4444522857666, test_abs_avg=28.446805953979492
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5007996559143066, max_abs=2.00390625, mean_rel=0.09481392800807953, max_rel=4.134815216064453, norm_rel=0.022189714014530182, ref_abs_avg=21.982091903686523, test_abs_avg=21.964174270629883
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6771082878112793, max_abs=5.0, mean_rel=0.15934604406356812, max_rel=1107.2474365234375, norm_rel=0.024827543646097183, ref_abs_avg=27.327598571777344, test_abs_avg=27.32468032836914
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6351360082626343, max_abs=4.8125, mean_rel=0.2664114236831665, max_rel=2062.5, norm_rel=0.02353905327618122, ref_abs_avg=27.056968688964844, test_abs_avg=27.056686401367188
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5065634250640869, max_abs=2.03125, mean_rel=0.09991423785686493, max_rel=13.217591285705566, norm_rel=0.02261306531727314, ref_abs_avg=22.12683868408203, test_abs_avg=22.131668090820312
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6372334957122803, max_abs=6.0, mean_rel=0.1464349925518036, max_rel=953.9939575195312, norm_rel=0.024218132719397545, ref_abs_avg=26.361270904541016, test_abs_avg=26.36048698425293
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5891176462173462, max_abs=4.25, mean_rel=0.20637838542461395, max_rel=1781.2498779296875, norm_rel=0.022945918142795563, ref_abs_avg=25.74637222290039, test_abs_avg=25.736207962036133
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4543285369873047, max_abs=1.5625, mean_rel=0.0770203247666359, max_rel=3.87813663482666, norm_rel=0.021396495401859283, ref_abs_avg=21.451648712158203, test_abs_avg=21.49357032775879
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5996831059455872, max_abs=5.0, mean_rel=0.1607779860496521, max_rel=1016.5799560546875, norm_rel=0.023966148495674133, ref_abs_avg=25.05724334716797, test_abs_avg=25.05620765686035
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5495442152023315, max_abs=4.0, mean_rel=0.2503853142261505, max_rel=1937.4998779296875, norm_rel=0.022531477734446526, ref_abs_avg=24.444364547729492, test_abs_avg=24.445940017700195
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4380870461463928, max_abs=2.015625, mean_rel=0.14057627320289612, max_rel=19.40510368347168, norm_rel=0.021583331748843193, ref_abs_avg=20.376632690429688, test_abs_avg=20.38010597229004
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5718955993652344, max_abs=4.875, mean_rel=0.14875954389572144, max_rel=1086.14599609375, norm_rel=0.02340387925505638, ref_abs_avg=24.439380645751953, test_abs_avg=24.439374923706055
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5204126834869385, max_abs=3.5, mean_rel=0.21720513701438904, max_rel=1718.7498779296875, norm_rel=0.021790752187371254, ref_abs_avg=23.885929107666016, test_abs_avg=23.884828567504883
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43320155143737793, max_abs=1.75, mean_rel=0.08994565904140472, max_rel=6.166929244995117, norm_rel=0.02434387244284153, ref_abs_avg=18.075881958007812, test_abs_avg=18.116695404052734
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5367015600204468, max_abs=4.5, mean_rel=0.1454332023859024, max_rel=1213.7677001953125, norm_rel=0.02316044457256794, ref_abs_avg=23.23097801208496, test_abs_avg=23.230485916137695
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.500762939453125, max_abs=3.40625, mean_rel=0.20777824521064758, max_rel=1437.4998779296875, norm_rel=0.021649548783898354, ref_abs_avg=23.151782989501953, test_abs_avg=23.15228843688965
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4162027835845947, max_abs=1.65625, mean_rel=0.10630833357572556, max_rel=7.870990753173828, norm_rel=0.02196844480931759, ref_abs_avg=18.986316680908203, test_abs_avg=18.983509063720703
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5192683935165405, max_abs=5.0, mean_rel=0.1383763700723648, max_rel=725.6312866210938, norm_rel=0.0227157361805439, ref_abs_avg=22.880176544189453, test_abs_avg=22.8781795501709
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.48112568259239197, max_abs=4.0, mean_rel=0.17603668570518494, max_rel=1250.0, norm_rel=0.02149643562734127, ref_abs_avg=22.46895980834961, test_abs_avg=22.470542907714844
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.437267541885376, max_abs=1.625, mean_rel=0.109623983502388, max_rel=19.673124313354492, norm_rel=0.02326374687254429, ref_abs_avg=18.983806610107422, test_abs_avg=19.007984161376953
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5480421185493469, max_abs=4.0, mean_rel=0.14572657644748688, max_rel=1150.0467529296875, norm_rel=0.024504130706191063, ref_abs_avg=22.401782989501953, test_abs_avg=22.398921966552734
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5053199529647827, max_abs=3.5, mean_rel=0.2351488471031189, max_rel=2156.25, norm_rel=0.022679049521684647, ref_abs_avg=22.303537368774414, test_abs_avg=22.306049346923828
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.40293216705322266, max_abs=2.15625, mean_rel=0.09324261546134949, max_rel=14.261122703552246, norm_rel=0.02409597672522068, ref_abs_avg=17.021465301513672, test_abs_avg=17.021896362304688
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5105692148208618, max_abs=4.0, mean_rel=0.1507377028465271, max_rel=576.2037353515625, norm_rel=0.024092158302664757, ref_abs_avg=21.223621368408203, test_abs_avg=21.222997665405273
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.47185954451560974, max_abs=3.75, mean_rel=0.23166874051094055, max_rel=1414.0623779296875, norm_rel=0.022867193445563316, ref_abs_avg=20.673660278320312, test_abs_avg=20.676023483276367
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.38232898712158203, max_abs=1.34375, mean_rel=0.09686820209026337, max_rel=4.726585388183594, norm_rel=0.023062150925397873, ref_abs_avg=16.05415153503418, test_abs_avg=16.036487579345703
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.46994781494140625, max_abs=4.0, mean_rel=0.15522003173828125, max_rel=1101.417236328125, norm_rel=0.023411672562360764, ref_abs_avg=20.118576049804688, test_abs_avg=20.118745803833008
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4389467239379883, max_abs=3.5, mean_rel=0.20934049785137177, max_rel=1437.4998779296875, norm_rel=0.02212022803723812, ref_abs_avg=19.930217742919922, test_abs_avg=19.940799713134766
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3452771306037903, max_abs=1.6328125, mean_rel=0.31150075793266296, max_rel=120.65443420410156, norm_rel=0.02136806771159172, ref_abs_avg=16.328575134277344, test_abs_avg=16.34023666381836
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.44571173191070557, max_abs=5.0, mean_rel=0.14086969196796417, max_rel=1012.6943969726562, norm_rel=0.022818461060523987, ref_abs_avg=19.60647201538086, test_abs_avg=19.60433578491211
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4046591520309448, max_abs=3.25, mean_rel=0.18745830655097961, max_rel=1031.25, norm_rel=0.021643565967679024, ref_abs_avg=18.807056427001953, test_abs_avg=18.807865142822266
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3061046004295349, max_abs=1.0625, mean_rel=0.061145659536123276, max_rel=5.953721523284912, norm_rel=0.01964591257274151, ref_abs_avg=15.391060829162598, test_abs_avg=15.366493225097656
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4139394164085388, max_abs=4.125, mean_rel=0.1349254846572876, max_rel=781.4188842773438, norm_rel=0.022162046283483505, ref_abs_avg=18.79989242553711, test_abs_avg=18.798425674438477
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.37780314683914185, max_abs=3.5, mean_rel=0.1730540245771408, max_rel=1250.0, norm_rel=0.020374374464154243, ref_abs_avg=18.581233978271484, test_abs_avg=18.584264755249023
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3247232437133789, max_abs=1.5263671875, mean_rel=0.1191091239452362, max_rel=7.371807098388672, norm_rel=0.021006235852837563, ref_abs_avg=15.649442672729492, test_abs_avg=15.622076988220215
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3995318114757538, max_abs=4.0, mean_rel=0.13958893716335297, max_rel=1590.789306640625, norm_rel=0.021931661292910576, ref_abs_avg=18.38957977294922, test_abs_avg=18.388717651367188
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3589332699775696, max_abs=3.5, mean_rel=0.16537559032440186, max_rel=1218.75, norm_rel=0.019850801676511765, ref_abs_avg=18.13667869567871, test_abs_avg=18.14101791381836
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2804008722305298, max_abs=1.25, mean_rel=0.09016557037830353, max_rel=13.589765548706055, norm_rel=0.019088972359895706, ref_abs_avg=14.859167098999023, test_abs_avg=14.871293067932129
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.36826092004776, max_abs=4.0, mean_rel=0.1301019936800003, max_rel=581.3795776367188, norm_rel=0.021008992567658424, ref_abs_avg=17.729942321777344, test_abs_avg=17.730857849121094
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3323230445384979, max_abs=3.8125, mean_rel=0.16747191548347473, max_rel=1343.7498779296875, norm_rel=0.019195890054106712, ref_abs_avg=17.540678024291992, test_abs_avg=17.544776916503906
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.27074432373046875, max_abs=1.125, mean_rel=0.08096019178628922, max_rel=7.871026515960693, norm_rel=0.020679576322436333, ref_abs_avg=13.244653701782227, test_abs_avg=13.231996536254883
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3489692509174347, max_abs=3.6875, mean_rel=0.11789892613887787, max_rel=402.32244873046875, norm_rel=0.020907042548060417, ref_abs_avg=16.97920799255371, test_abs_avg=16.97735595703125
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32324594259262085, max_abs=3.5, mean_rel=0.17921023070812225, max_rel=1328.1248779296875, norm_rel=0.019513236358761787, ref_abs_avg=16.944499969482422, test_abs_avg=16.93584442138672
production_forward2 vs paper_forward output: mean_abs=0.0016217887168750167, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008479209616780281, max_abs=0.390625, mean_rel=0.07404276728630066, max_rel=114.34967041015625, norm_rel=0.020173247903585434, ref_abs_avg=0.4524350166320801, test_abs_avg=0.4524257779121399
production_forward2 grad[1] vs paper_forward: mean_abs=7.211483478546143, max_abs=77.75, mean_rel=0.17091704905033112, max_rel=381.23516845703125, norm_rel=0.02005901373922825, ref_abs_avg=314.0838928222656, test_abs_avg=314.1716003417969
production_forward2 grad[2] vs paper_forward: mean_abs=1.3206100463867188, max_abs=5.5078125, mean_rel=0.0944175273180008, max_rel=5.137535572052002, norm_rel=0.02591048739850521, ref_abs_avg=51.695945739746094, test_abs_avg=51.671485900878906
production_forward2 grad[3] vs paper_forward: mean_abs=1.6150518655776978, max_abs=12.0, mean_rel=0.16956982016563416, max_rel=1192.3939208984375, norm_rel=0.0250835120677948, ref_abs_avg=64.81581115722656, test_abs_avg=64.81659698486328
production_forward2 grad[4] vs paper_forward: mean_abs=1.4972172975540161, max_abs=8.5, mean_rel=0.39121657609939575, max_rel=5093.75, norm_rel=0.02353748120367527, ref_abs_avg=63.85792541503906, test_abs_avg=63.85032272338867
production_forward2 grad[5] vs paper_forward: mean_abs=1.0203924179077148, max_abs=4.0, mean_rel=0.12590420246124268, max_rel=23.95912742614746, norm_rel=0.02216685190796852, ref_abs_avg=46.34657287597656, test_abs_avg=46.41808319091797
production_forward2 grad[6] vs paper_forward: mean_abs=1.402495265007019, max_abs=10.0, mean_rel=0.1655760258436203, max_rel=1911.6707763671875, norm_rel=0.024857347831130028, ref_abs_avg=56.77775192260742, test_abs_avg=56.772850036621094
production_forward2 grad[7] vs paper_forward: mean_abs=1.300611972808838, max_abs=7.96875, mean_rel=0.37115997076034546, max_rel=4187.5, norm_rel=0.023307235911488533, ref_abs_avg=56.00895690917969, test_abs_avg=56.00672149658203
production_forward2 grad[8] vs paper_forward: mean_abs=0.9974884986877441, max_abs=4.3125, mean_rel=0.16466432809829712, max_rel=25.910322189331055, norm_rel=0.024565940722823143, ref_abs_avg=41.21363067626953, test_abs_avg=41.21276092529297
production_forward2 grad[9] vs paper_forward: mean_abs=1.2613458633422852, max_abs=8.0, mean_rel=0.18059104681015015, max_rel=2045.3184814453125, norm_rel=0.02452694997191429, ref_abs_avg=51.71076965332031, test_abs_avg=51.714874267578125
production_forward2 grad[10] vs paper_forward: mean_abs=1.160658597946167, max_abs=7.0, mean_rel=0.3188934922218323, max_rel=3749.999755859375, norm_rel=0.022916778922080994, ref_abs_avg=50.914794921875, test_abs_avg=50.90709686279297
production_forward2 grad[11] vs paper_forward: mean_abs=0.9102716445922852, max_abs=4.0, mean_rel=0.1134767085313797, max_rel=14.649394035339355, norm_rel=0.0245228111743927, ref_abs_avg=37.534263610839844, test_abs_avg=37.44481658935547
production_forward2 grad[12] vs paper_forward: mean_abs=1.1677173376083374, max_abs=9.0, mean_rel=0.15683645009994507, max_rel=979.371826171875, norm_rel=0.024367427453398705, ref_abs_avg=48.20590591430664, test_abs_avg=48.203697204589844
production_forward2 grad[13] vs paper_forward: mean_abs=1.0777710676193237, max_abs=7.0, mean_rel=0.35067081451416016, max_rel=4031.249755859375, norm_rel=0.022763995453715324, ref_abs_avg=47.61308288574219, test_abs_avg=47.610923767089844
production_forward2 grad[14] vs paper_forward: mean_abs=0.8473828434944153, max_abs=3.375, mean_rel=1.002479910850525, max_rel=469.4719543457031, norm_rel=0.022129453718662262, ref_abs_avg=38.398475646972656, test_abs_avg=38.38123321533203
production_forward2 grad[15] vs paper_forward: mean_abs=1.09821355342865, max_abs=8.0, mean_rel=0.14861947298049927, max_rel=872.9212036132812, norm_rel=0.024198424071073532, ref_abs_avg=45.661468505859375, test_abs_avg=45.65904998779297
production_forward2 grad[16] vs paper_forward: mean_abs=1.0133858919143677, max_abs=6.0, mean_rel=0.3114021420478821, max_rel=3187.499755859375, norm_rel=0.022559216246008873, ref_abs_avg=45.16523361206055, test_abs_avg=45.15978240966797
production_forward2 grad[17] vs paper_forward: mean_abs=0.8452177047729492, max_abs=3.75, mean_rel=0.12464489042758942, max_rel=22.5472469329834, norm_rel=0.02337290719151497, ref_abs_avg=36.58955001831055, test_abs_avg=36.596893310546875
production_forward2 grad[18] vs paper_forward: mean_abs=1.0324428081512451, max_abs=8.0, mean_rel=0.17316393554210663, max_rel=1633.0440673828125, norm_rel=0.0240969005972147, ref_abs_avg=43.10002517700195, test_abs_avg=43.0980110168457
production_forward2 grad[19] vs paper_forward: mean_abs=0.9487621784210205, max_abs=5.75, mean_rel=0.2918367385864258, max_rel=3406.249755859375, norm_rel=0.022298434749245644, ref_abs_avg=42.7528076171875, test_abs_avg=42.75286102294922
production_forward2 grad[20] vs paper_forward: mean_abs=0.7728481292724609, max_abs=2.75, mean_rel=0.09661510586738586, max_rel=6.815266132354736, norm_rel=0.02304081991314888, ref_abs_avg=33.62725067138672, test_abs_avg=33.68337631225586
production_forward2 grad[21] vs paper_forward: mean_abs=0.9830193519592285, max_abs=6.5, mean_rel=0.14933982491493225, max_rel=1935.6790771484375, norm_rel=0.02393035590648651, ref_abs_avg=41.320228576660156, test_abs_avg=41.31775665283203
production_forward2 grad[22] vs paper_forward: mean_abs=0.9017259478569031, max_abs=6.0, mean_rel=0.313852459192276, max_rel=2562.5, norm_rel=0.022175122052431107, ref_abs_avg=40.88514709472656, test_abs_avg=40.88014221191406
production_forward2 grad[23] vs paper_forward: mean_abs=0.7046064138412476, max_abs=3.25, mean_rel=1.4512745141983032, max_rel=709.851806640625, norm_rel=0.022164171561598778, ref_abs_avg=32.42936325073242, test_abs_avg=32.429237365722656
production_forward2 grad[24] vs paper_forward: mean_abs=0.9316016435623169, max_abs=6.75, mean_rel=0.1583758294582367, max_rel=2056.992919921875, norm_rel=0.02373245544731617, ref_abs_avg=39.494239807128906, test_abs_avg=39.49237823486328
production_forward2 grad[25] vs paper_forward: mean_abs=0.8569704294204712, max_abs=5.25, mean_rel=0.2728881239891052, max_rel=2156.25, norm_rel=0.021906845271587372, ref_abs_avg=39.384620666503906, test_abs_avg=39.382667541503906
production_forward2 grad[26] vs paper_forward: mean_abs=0.8586294651031494, max_abs=4.0, mean_rel=0.09823761880397797, max_rel=15.00820255279541, norm_rel=0.023881100118160248, ref_abs_avg=36.23738479614258, test_abs_avg=36.2440185546875
production_forward2 grad[27] vs paper_forward: mean_abs=1.080027461051941, max_abs=7.5, mean_rel=0.17353783547878265, max_rel=1956.533447265625, norm_rel=0.02570895105600357, ref_abs_avg=42.27507019042969, test_abs_avg=42.27518081665039
production_forward2 grad[28] vs paper_forward: mean_abs=0.9999565482139587, max_abs=6.0, mean_rel=0.39517951011657715, max_rel=3406.249755859375, norm_rel=0.023996800184249878, ref_abs_avg=41.825439453125, test_abs_avg=41.826168060302734
production_forward2 grad[29] vs paper_forward: mean_abs=0.7864107489585876, max_abs=2.75, mean_rel=0.31428635120391846, max_rel=33.64459991455078, norm_rel=0.02479487843811512, ref_abs_avg=31.108558654785156, test_abs_avg=31.12777328491211
production_forward2 grad[30] vs paper_forward: mean_abs=0.9974215030670166, max_abs=7.0, mean_rel=0.17069639265537262, max_rel=2963.762939453125, norm_rel=0.02597201243042946, ref_abs_avg=38.608055114746094, test_abs_avg=38.60505676269531
production_forward2 grad[31] vs paper_forward: mean_abs=0.9372674226760864, max_abs=7.0, mean_rel=0.31111574172973633, max_rel=3156.249755859375, norm_rel=0.02460615523159504, ref_abs_avg=38.167057037353516, test_abs_avg=38.15816116333008
production_forward2 grad[32] vs paper_forward: mean_abs=0.7413525581359863, max_abs=2.75, mean_rel=0.29028844833374023, max_rel=39.18336868286133, norm_rel=0.023496439680457115, ref_abs_avg=31.432157516479492, test_abs_avg=31.427433013916016
production_forward2 grad[33] vs paper_forward: mean_abs=0.9281865358352661, max_abs=6.0, mean_rel=0.17029660940170288, max_rel=1280.2249755859375, norm_rel=0.025628039613366127, ref_abs_avg=36.377220153808594, test_abs_avg=36.376197814941406
production_forward2 grad[34] vs paper_forward: mean_abs=0.865146815776825, max_abs=5.375, mean_rel=0.301047682762146, max_rel=2375.0, norm_rel=0.024096323177218437, ref_abs_avg=35.991676330566406, test_abs_avg=35.994144439697266
production_forward2 grad[35] vs paper_forward: mean_abs=0.6894795894622803, max_abs=3.0, mean_rel=0.1415833681821823, max_rel=18.2943058013916, norm_rel=0.02516535483300686, ref_abs_avg=26.90456199645996, test_abs_avg=26.918426513671875
production_forward2 grad[36] vs paper_forward: mean_abs=0.8689360618591309, max_abs=6.0, mean_rel=0.1786985695362091, max_rel=1963.4586181640625, norm_rel=0.025376448407769203, ref_abs_avg=34.353328704833984, test_abs_avg=34.35026168823242
production_forward2 grad[37] vs paper_forward: mean_abs=0.8183212280273438, max_abs=4.75, mean_rel=0.3230394124984741, max_rel=2890.624755859375, norm_rel=0.024169564247131348, ref_abs_avg=33.95781707763672, test_abs_avg=33.95989990234375
production_forward2 grad[38] vs paper_forward: mean_abs=0.6369819641113281, max_abs=2.5, mean_rel=0.08862128853797913, max_rel=4.566837310791016, norm_rel=0.023417485877871513, ref_abs_avg=27.78165054321289, test_abs_avg=27.821205139160156
production_forward2 grad[39] vs paper_forward: mean_abs=0.816525399684906, max_abs=6.0, mean_rel=0.16303998231887817, max_rel=2040.518310546875, norm_rel=0.025218253955245018, ref_abs_avg=32.466102600097656, test_abs_avg=32.46576690673828
production_forward2 grad[40] vs paper_forward: mean_abs=0.7640233039855957, max_abs=4.625, mean_rel=0.29242655634880066, max_rel=2874.999755859375, norm_rel=0.023641224950551987, ref_abs_avg=32.355186462402344, test_abs_avg=32.35186004638672
production_forward2 grad[41] vs paper_forward: mean_abs=0.6268386840820312, max_abs=2.25, mean_rel=0.10420605540275574, max_rel=9.09035587310791, norm_rel=0.024675728753209114, ref_abs_avg=25.678850173950195, test_abs_avg=25.75364112854004
production_forward2 grad[42] vs paper_forward: mean_abs=0.774763286113739, max_abs=6.5, mean_rel=0.15982988476753235, max_rel=913.7333374023438, norm_rel=0.024828536435961723, ref_abs_avg=31.29768180847168, test_abs_avg=31.29452133178711
production_forward2 grad[43] vs paper_forward: mean_abs=0.7157411575317383, max_abs=4.0, mean_rel=0.2568660080432892, max_rel=2562.5, norm_rel=0.023249797523021698, ref_abs_avg=30.82806396484375, test_abs_avg=30.829593658447266
production_forward2 grad[44] vs paper_forward: mean_abs=0.5762917995452881, max_abs=2.125, mean_rel=0.08578821271657944, max_rel=6.194764614105225, norm_rel=0.023303257301449776, ref_abs_avg=24.554292678833008, test_abs_avg=24.571468353271484
production_forward2 grad[45] vs paper_forward: mean_abs=0.736161470413208, max_abs=5.5, mean_rel=0.16124600172042847, max_rel=765.66015625, norm_rel=0.024611487984657288, ref_abs_avg=30.016223907470703, test_abs_avg=30.01314926147461
production_forward2 grad[46] vs paper_forward: mean_abs=0.6838032007217407, max_abs=4.5, mean_rel=0.29201868176460266, max_rel=1874.9998779296875, norm_rel=0.023299258202314377, ref_abs_avg=29.46951675415039, test_abs_avg=29.473230361938477
production_forward2 grad[47] vs paper_forward: mean_abs=0.5327897071838379, max_abs=2.25, mean_rel=0.09175485372543335, max_rel=6.448357582092285, norm_rel=0.02351709082722664, ref_abs_avg=23.75469970703125, test_abs_avg=23.708423614501953
production_forward2 grad[48] vs paper_forward: mean_abs=0.7008894681930542, max_abs=8.0, mean_rel=0.15634587407112122, max_rel=1010.1810302734375, norm_rel=0.024535436183214188, ref_abs_avg=28.6768798828125, test_abs_avg=28.673721313476562
production_forward2 grad[49] vs paper_forward: mean_abs=0.652427077293396, max_abs=4.1171875, mean_rel=0.26114150881767273, max_rel=2031.2498779296875, norm_rel=0.022932391613721848, ref_abs_avg=28.48326873779297, test_abs_avg=28.483196258544922
production_forward2 grad[50] vs paper_forward: mean_abs=0.6263952255249023, max_abs=2.5, mean_rel=0.09799700230360031, max_rel=9.524930000305176, norm_rel=0.024936366826295853, ref_abs_avg=25.58939552307129, test_abs_avg=25.61581039428711
production_forward2 grad[51] vs paper_forward: mean_abs=0.7879253625869751, max_abs=5.75, mean_rel=0.17333978414535522, max_rel=1313.3328857421875, norm_rel=0.025543024763464928, ref_abs_avg=30.96274185180664, test_abs_avg=30.961334228515625
production_forward2 grad[52] vs paper_forward: mean_abs=0.7372660636901855, max_abs=4.5, mean_rel=0.28862088918685913, max_rel=1718.7498779296875, norm_rel=0.02437436208128929, ref_abs_avg=30.39059066772461, test_abs_avg=30.392719268798828
production_forward2 grad[53] vs paper_forward: mean_abs=0.5764203071594238, max_abs=2.0, mean_rel=0.09203220903873444, max_rel=4.945109844207764, norm_rel=0.02425454370677471, ref_abs_avg=24.05785369873047, test_abs_avg=24.08887481689453
production_forward2 grad[54] vs paper_forward: mean_abs=0.732745885848999, max_abs=5.25, mean_rel=0.16072091460227966, max_rel=965.05810546875, norm_rel=0.025275466963648796, ref_abs_avg=29.054903030395508, test_abs_avg=29.05331802368164
production_forward2 grad[55] vs paper_forward: mean_abs=0.677947998046875, max_abs=4.5, mean_rel=0.3052639067173004, max_rel=3374.999755859375, norm_rel=0.02385149896144867, ref_abs_avg=28.4444522857666, test_abs_avg=28.449012756347656
production_forward2 grad[56] vs paper_forward: mean_abs=0.5043540000915527, max_abs=2.37890625, mean_rel=0.1042521744966507, max_rel=8.344489097595215, norm_rel=0.022593120113015175, ref_abs_avg=21.982091903686523, test_abs_avg=21.977554321289062
production_forward2 grad[57] vs paper_forward: mean_abs=0.6760083436965942, max_abs=5.0, mean_rel=0.16230884194374084, max_rel=1177.3199462890625, norm_rel=0.024789592251181602, ref_abs_avg=27.327598571777344, test_abs_avg=27.32416534423828
production_forward2 grad[58] vs paper_forward: mean_abs=0.6332978010177612, max_abs=4.9375, mean_rel=0.26117581129074097, max_rel=2281.25, norm_rel=0.023459427058696747, ref_abs_avg=27.056968688964844, test_abs_avg=27.05870246887207
production_forward2 grad[59] vs paper_forward: mean_abs=0.4987344741821289, max_abs=1.8125, mean_rel=0.11950820684432983, max_rel=13.217591285705566, norm_rel=0.02263784594833851, ref_abs_avg=22.12683868408203, test_abs_avg=22.129926681518555
production_forward2 grad[60] vs paper_forward: mean_abs=0.6368629932403564, max_abs=4.75, mean_rel=0.14664427936077118, max_rel=1047.2864990234375, norm_rel=0.02419336698949337, ref_abs_avg=26.361270904541016, test_abs_avg=26.361305236816406
production_forward2 grad[61] vs paper_forward: mean_abs=0.5877877473831177, max_abs=4.0, mean_rel=0.20139816403388977, max_rel=2140.625, norm_rel=0.02287895418703556, ref_abs_avg=25.74637222290039, test_abs_avg=25.738332748413086
production_forward2 grad[62] vs paper_forward: mean_abs=0.4622197151184082, max_abs=1.875, mean_rel=0.07445485889911652, max_rel=4.256077766418457, norm_rel=0.02178225666284561, ref_abs_avg=21.451648712158203, test_abs_avg=21.486103057861328
production_forward2 grad[63] vs paper_forward: mean_abs=0.5992422103881836, max_abs=4.25, mean_rel=0.1629454344511032, max_rel=1071.78857421875, norm_rel=0.023944510146975517, ref_abs_avg=25.05724334716797, test_abs_avg=25.0560302734375
production_forward2 grad[64] vs paper_forward: mean_abs=0.5466548800468445, max_abs=4.0, mean_rel=0.2557595372200012, max_rel=1906.2498779296875, norm_rel=0.02240830473601818, ref_abs_avg=24.444364547729492, test_abs_avg=24.446496963500977
production_forward2 grad[65] vs paper_forward: mean_abs=0.43218135833740234, max_abs=1.8125, mean_rel=0.13566932082176208, max_rel=18.916732788085938, norm_rel=0.02087562531232834, ref_abs_avg=20.376632690429688, test_abs_avg=20.370304107666016
production_forward2 grad[66] vs paper_forward: mean_abs=0.5709913969039917, max_abs=5.0, mean_rel=0.1482420265674591, max_rel=1136.3814697265625, norm_rel=0.023392612114548683, ref_abs_avg=24.439380645751953, test_abs_avg=24.43954849243164
production_forward2 grad[67] vs paper_forward: mean_abs=0.5176161527633667, max_abs=4.0, mean_rel=0.22768434882164001, max_rel=1578.1248779296875, norm_rel=0.02167077176272869, ref_abs_avg=23.885929107666016, test_abs_avg=23.88817596435547
production_forward2 grad[68] vs paper_forward: mean_abs=0.4258003234863281, max_abs=1.671875, mean_rel=0.09090500324964523, max_rel=6.138769626617432, norm_rel=0.02412271499633789, ref_abs_avg=18.075881958007812, test_abs_avg=18.11404037475586
production_forward2 grad[69] vs paper_forward: mean_abs=0.5361793041229248, max_abs=4.0, mean_rel=0.14502966403961182, max_rel=899.2106323242188, norm_rel=0.02313370257616043, ref_abs_avg=23.23097801208496, test_abs_avg=23.23052978515625
production_forward2 grad[70] vs paper_forward: mean_abs=0.49868375062942505, max_abs=3.5, mean_rel=0.20286458730697632, max_rel=1156.25, norm_rel=0.02157055214047432, ref_abs_avg=23.151782989501953, test_abs_avg=23.153186798095703
production_forward2 grad[71] vs paper_forward: mean_abs=0.41717541217803955, max_abs=1.607421875, mean_rel=0.09723930060863495, max_rel=8.107416152954102, norm_rel=0.022189470008015633, ref_abs_avg=18.986316680908203, test_abs_avg=18.979446411132812
production_forward2 grad[72] vs paper_forward: mean_abs=0.5190050005912781, max_abs=5.0, mean_rel=0.14111757278442383, max_rel=1171.87109375, norm_rel=0.022704679518938065, ref_abs_avg=22.880176544189453, test_abs_avg=22.878402709960938
production_forward2 grad[73] vs paper_forward: mean_abs=0.4805418848991394, max_abs=4.0, mean_rel=0.17793533205986023, max_rel=1125.0, norm_rel=0.021466387435793877, ref_abs_avg=22.46895980834961, test_abs_avg=22.47064208984375
production_forward2 grad[74] vs paper_forward: mean_abs=0.43831682205200195, max_abs=1.578125, mean_rel=0.08961942791938782, max_rel=12.84039306640625, norm_rel=0.022857625037431717, ref_abs_avg=18.983806610107422, test_abs_avg=18.998477935791016
production_forward2 grad[75] vs paper_forward: mean_abs=0.5467914938926697, max_abs=4.5, mean_rel=0.14777769148349762, max_rel=1027.9586181640625, norm_rel=0.02445402927696705, ref_abs_avg=22.401782989501953, test_abs_avg=22.399017333984375
production_forward2 grad[76] vs paper_forward: mean_abs=0.5053654313087463, max_abs=3.5, mean_rel=0.22924718260765076, max_rel=2093.75, norm_rel=0.02268308773636818, ref_abs_avg=22.303537368774414, test_abs_avg=22.30432891845703
production_forward2 grad[77] vs paper_forward: mean_abs=0.40990209579467773, max_abs=1.96875, mean_rel=0.08629628270864487, max_rel=9.87664794921875, norm_rel=0.024160677567124367, ref_abs_avg=17.021465301513672, test_abs_avg=17.028032302856445
production_forward2 grad[78] vs paper_forward: mean_abs=0.5097932815551758, max_abs=4.5, mean_rel=0.15301057696342468, max_rel=1052.7581787109375, norm_rel=0.024052614346146584, ref_abs_avg=21.223621368408203, test_abs_avg=21.22262954711914
production_forward2 grad[79] vs paper_forward: mean_abs=0.4724527895450592, max_abs=4.25, mean_rel=0.2369856834411621, max_rel=1781.2498779296875, norm_rel=0.022928236052393913, ref_abs_avg=20.673660278320312, test_abs_avg=20.672842025756836
production_forward2 grad[80] vs paper_forward: mean_abs=0.3744640350341797, max_abs=1.28125, mean_rel=0.10212981700897217, max_rel=6.225525379180908, norm_rel=0.022544868290424347, ref_abs_avg=16.05415153503418, test_abs_avg=16.03428077697754
production_forward2 grad[81] vs paper_forward: mean_abs=0.4700736999511719, max_abs=4.0, mean_rel=0.15457233786582947, max_rel=1270.8681640625, norm_rel=0.023420238867402077, ref_abs_avg=20.118576049804688, test_abs_avg=20.117542266845703
production_forward2 grad[82] vs paper_forward: mean_abs=0.4394066333770752, max_abs=3.65625, mean_rel=0.20914483070373535, max_rel=1499.9998779296875, norm_rel=0.022181332111358643, ref_abs_avg=19.930217742919922, test_abs_avg=19.940593719482422
production_forward2 grad[83] vs paper_forward: mean_abs=0.33646470308303833, max_abs=1.5, mean_rel=0.31213197112083435, max_rel=117.529541015625, norm_rel=0.020912764593958855, ref_abs_avg=16.328575134277344, test_abs_avg=16.34309196472168
production_forward2 grad[84] vs paper_forward: mean_abs=0.4452901780605316, max_abs=6.0, mean_rel=0.14089691638946533, max_rel=1040.66162109375, norm_rel=0.02279418334364891, ref_abs_avg=19.60647201538086, test_abs_avg=19.605052947998047
production_forward2 grad[85] vs paper_forward: mean_abs=0.40671205520629883, max_abs=3.84375, mean_rel=0.1949683278799057, max_rel=890.6249389648438, norm_rel=0.021810125559568405, ref_abs_avg=18.807056427001953, test_abs_avg=18.809860229492188
production_forward2 grad[86] vs paper_forward: mean_abs=0.31220436096191406, max_abs=1.125, mean_rel=0.08149732649326324, max_rel=13.779417037963867, norm_rel=0.020167291164398193, ref_abs_avg=15.391060829162598, test_abs_avg=15.35305118560791
production_forward2 grad[87] vs paper_forward: mean_abs=0.41338545083999634, max_abs=5.5, mean_rel=0.13466058671474457, max_rel=718.9327392578125, norm_rel=0.022124774754047394, ref_abs_avg=18.79989242553711, test_abs_avg=18.799108505249023
production_forward2 grad[88] vs paper_forward: mean_abs=0.3744440972805023, max_abs=3.5, mean_rel=0.17088691890239716, max_rel=1312.4998779296875, norm_rel=0.0201428085565567, ref_abs_avg=18.581233978271484, test_abs_avg=18.580463409423828
production_forward2 grad[89] vs paper_forward: mean_abs=0.3133831024169922, max_abs=1.34375, mean_rel=0.10937133431434631, max_rel=7.327123641967773, norm_rel=0.020484667271375656, ref_abs_avg=15.649442672729492, test_abs_avg=15.625632286071777
production_forward2 grad[90] vs paper_forward: mean_abs=0.39877888560295105, max_abs=3.5, mean_rel=0.13899576663970947, max_rel=1094.5592041015625, norm_rel=0.021885981783270836, ref_abs_avg=18.38957977294922, test_abs_avg=18.389802932739258
production_forward2 grad[91] vs paper_forward: mean_abs=0.36073875427246094, max_abs=3.28125, mean_rel=0.16641521453857422, max_rel=1156.25, norm_rel=0.019950585439801216, ref_abs_avg=18.13667869567871, test_abs_avg=18.139591217041016
production_forward2 grad[92] vs paper_forward: mean_abs=0.2766000032424927, max_abs=1.25, mean_rel=0.11774703115224838, max_rel=14.870771408081055, norm_rel=0.0189767237752676, ref_abs_avg=14.859167098999023, test_abs_avg=14.865888595581055
production_forward2 grad[93] vs paper_forward: mean_abs=0.3680262565612793, max_abs=4.0, mean_rel=0.1300671547651291, max_rel=814.0060424804688, norm_rel=0.021006455644965172, ref_abs_avg=17.729942321777344, test_abs_avg=17.73078155517578
production_forward2 grad[94] vs paper_forward: mean_abs=0.33155956864356995, max_abs=3.0, mean_rel=0.16778507828712463, max_rel=1218.75, norm_rel=0.019112704321742058, ref_abs_avg=17.540678024291992, test_abs_avg=17.545059204101562
production_forward2 grad[95] vs paper_forward: mean_abs=0.2761402130126953, max_abs=1.125, mean_rel=0.08443284034729004, max_rel=8.630289077758789, norm_rel=0.021002130582928658, ref_abs_avg=13.244653701782227, test_abs_avg=13.24276351928711
production_forward2 grad[96] vs paper_forward: mean_abs=0.34899818897247314, max_abs=4.0, mean_rel=0.11943434178829193, max_rel=505.67230224609375, norm_rel=0.020897749811410904, ref_abs_avg=16.97920799255371, test_abs_avg=16.977466583251953
production_forward2 grad[97] vs paper_forward: mean_abs=0.3240957260131836, max_abs=3.75, mean_rel=0.17392054200172424, max_rel=1437.4998779296875, norm_rel=0.01954592950642109, ref_abs_avg=16.944499969482422, test_abs_avg=16.941612243652344
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  165.916 ms
torch_compile_phases_forward bwd-only: 132.532 ms
torch_compile_phases_forward peak allocated: fwd=13.078 GiB, fwd+bwd=13.706 GiB
torch_compile_phases_forward peak reserved:  fwd=13.375 GiB, fwd+bwd=17.627 GiB
production_forward fwd+bwd:  116.302 ms
production_forward bwd-only: 95.882 ms
production_forward peak allocated: fwd=3.368 GiB, fwd+bwd=10.493 GiB
production_forward peak reserved:  fwd=3.604 GiB, fwd+bwd=11.604 GiB
production_forward2 fwd+bwd:  191.474 ms
production_forward2 bwd-only: 172.378 ms
production_forward2 peak allocated: fwd=2.864 GiB, fwd+bwd=6.243 GiB
production_forward2 peak reserved:  fwd=3.229 GiB, fwd+bwd=8.979 GiB
paper_forward fwd+bwd:  384.541 ms
paper_forward bwd-only: 304.162 ms
paper_forward peak allocated: fwd=30.003 GiB, fwd+bwd=32.122 GiB
paper_forward peak reserved:  fwd=30.021 GiB, fwd+bwd=32.771 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016602205578237772, max_abs=0.0391845703125
production_forward grad[0] vs paper_forward: mean_abs=0.00838843546807766, max_abs=0.4375, mean_rel=0.07209163904190063, max_rel=123.42272186279297, norm_rel=0.019816488027572632, ref_abs_avg=0.4613272547721863, test_abs_avg=0.4613381028175354
production_forward grad[1] vs paper_forward: mean_abs=7.329070568084717, max_abs=60.0, mean_rel=0.13038165867328644, max_rel=239.97714233398438, norm_rel=0.020192719995975494, ref_abs_avg=323.8179016113281, test_abs_avg=323.86993408203125
production_forward grad[2] vs paper_forward: mean_abs=1.223556399345398, max_abs=5.5, mean_rel=0.17503246665000916, max_rel=39.31794357299805, norm_rel=0.023622028529644012, ref_abs_avg=52.27435302734375, test_abs_avg=52.22895812988281
production_forward grad[3] vs paper_forward: mean_abs=1.5828545093536377, max_abs=11.5, mean_rel=0.1675289422273636, max_rel=2323.63525390625, norm_rel=0.024566125124692917, ref_abs_avg=64.87458038330078, test_abs_avg=64.87779235839844
production_forward grad[4] vs paper_forward: mean_abs=1.4577406644821167, max_abs=8.5, mean_rel=0.4079042673110962, max_rel=4812.5, norm_rel=0.02302524447441101, ref_abs_avg=63.66494369506836, test_abs_avg=63.66567611694336
production_forward grad[5] vs paper_forward: mean_abs=1.045374870300293, max_abs=4.5, mean_rel=0.06714464724063873, max_rel=3.4411816596984863, norm_rel=0.02288660779595375, ref_abs_avg=46.364498138427734, test_abs_avg=46.363685607910156
production_forward grad[6] vs paper_forward: mean_abs=1.405137300491333, max_abs=9.0, mean_rel=0.16694889962673187, max_rel=2342.940185546875, norm_rel=0.024296315386891365, ref_abs_avg=58.222412109375, test_abs_avg=58.225318908691406
production_forward grad[7] vs paper_forward: mean_abs=1.304176926612854, max_abs=8.75, mean_rel=0.4321373701095581, max_rel=4500.0, norm_rel=0.02273848094046116, ref_abs_avg=57.59020233154297, test_abs_avg=57.58373260498047
production_forward grad[8] vs paper_forward: mean_abs=0.9908294677734375, max_abs=4.75, mean_rel=0.09138964116573334, max_rel=7.509380340576172, norm_rel=0.02257291041314602, ref_abs_avg=44.22251892089844, test_abs_avg=44.24488830566406
production_forward grad[9] vs paper_forward: mean_abs=1.2882810831069946, max_abs=8.5, mean_rel=0.17936110496520996, max_rel=3049.787841796875, norm_rel=0.02419147826731205, ref_abs_avg=53.65225601196289, test_abs_avg=53.65631866455078
production_forward grad[10] vs paper_forward: mean_abs=1.1834027767181396, max_abs=7.0, mean_rel=0.37327975034713745, max_rel=4125.0, norm_rel=0.02265852689743042, ref_abs_avg=52.507110595703125, test_abs_avg=52.501522064208984
production_forward grad[11] vs paper_forward: mean_abs=0.8873164653778076, max_abs=3.25, mean_rel=0.21040791273117065, max_rel=63.92845153808594, norm_rel=0.02207103744149208, ref_abs_avg=40.313507080078125, test_abs_avg=40.3675651550293
production_forward grad[12] vs paper_forward: mean_abs=1.1853735446929932, max_abs=8.5, mean_rel=0.16381622850894928, max_rel=1785.416015625, norm_rel=0.02410026825964451, ref_abs_avg=49.50634765625, test_abs_avg=49.50649642944336
production_forward grad[13] vs paper_forward: mean_abs=1.0968152284622192, max_abs=7.0, mean_rel=0.30722612142562866, max_rel=3874.999755859375, norm_rel=0.022411596029996872, ref_abs_avg=49.221893310546875, test_abs_avg=49.222679138183594
production_forward grad[14] vs paper_forward: mean_abs=0.8893094062805176, max_abs=3.875, mean_rel=0.10287792980670929, max_rel=5.9445271492004395, norm_rel=0.024870583787560463, ref_abs_avg=34.59674835205078, test_abs_avg=34.77233123779297
production_forward grad[15] vs paper_forward: mean_abs=1.1107057332992554, max_abs=8.0, mean_rel=0.17241951823234558, max_rel=2213.205322265625, norm_rel=0.02389560267329216, ref_abs_avg=46.76941680908203, test_abs_avg=46.77373504638672
production_forward grad[16] vs paper_forward: mean_abs=1.0159646272659302, max_abs=6.375, mean_rel=0.2607119083404541, max_rel=2937.499755859375, norm_rel=0.022206798195838928, ref_abs_avg=46.01008224487305, test_abs_avg=46.01055145263672
production_forward grad[17] vs paper_forward: mean_abs=0.8097844123840332, max_abs=3.25, mean_rel=0.10857094824314117, max_rel=9.303112983703613, norm_rel=0.021516622975468636, ref_abs_avg=36.993980407714844, test_abs_avg=36.95841979980469
production_forward grad[18] vs paper_forward: mean_abs=1.054814100265503, max_abs=8.0, mean_rel=0.16441752016544342, max_rel=1037.724365234375, norm_rel=0.023798182606697083, ref_abs_avg=44.61968994140625, test_abs_avg=44.620208740234375
production_forward grad[19] vs paper_forward: mean_abs=0.9633810520172119, max_abs=6.0, mean_rel=0.3339502811431885, max_rel=2500.0, norm_rel=0.021990342065691948, ref_abs_avg=44.02014923095703, test_abs_avg=44.02028274536133
production_forward grad[20] vs paper_forward: mean_abs=0.7471427917480469, max_abs=3.25, mean_rel=0.06909240037202835, max_rel=4.0216965675354, norm_rel=0.02170347049832344, ref_abs_avg=34.92405700683594, test_abs_avg=34.91701889038086
production_forward grad[21] vs paper_forward: mean_abs=0.9925837516784668, max_abs=7.0, mean_rel=0.16710376739501953, max_rel=1663.5731201171875, norm_rel=0.023690564557909966, ref_abs_avg=42.16720199584961, test_abs_avg=42.17234802246094
production_forward grad[22] vs paper_forward: mean_abs=0.9103183746337891, max_abs=5.75, mean_rel=0.30793869495391846, max_rel=2812.499755859375, norm_rel=0.021942365914583206, ref_abs_avg=41.691978454589844, test_abs_avg=41.69518280029297
production_forward grad[23] vs paper_forward: mean_abs=0.7638792991638184, max_abs=3.0, mean_rel=0.1425826996564865, max_rel=21.207046508789062, norm_rel=0.024172410368919373, ref_abs_avg=32.034175872802734, test_abs_avg=31.98632049560547
production_forward grad[24] vs paper_forward: mean_abs=0.9445720911026001, max_abs=6.0, mean_rel=0.17019930481910706, max_rel=1537.7579345703125, norm_rel=0.02343314327299595, ref_abs_avg=40.562591552734375, test_abs_avg=40.563133239746094
production_forward grad[25] vs paper_forward: mean_abs=0.8689689040184021, max_abs=5.625, mean_rel=0.26993000507354736, max_rel=1999.9998779296875, norm_rel=0.02196451462805271, ref_abs_avg=39.797645568847656, test_abs_avg=39.801597595214844
production_forward grad[26] vs paper_forward: mean_abs=0.8678967952728271, max_abs=3.5, mean_rel=0.1538567692041397, max_rel=18.58887481689453, norm_rel=0.024477137252688408, ref_abs_avg=34.44279479980469, test_abs_avg=34.42539596557617
production_forward grad[27] vs paper_forward: mean_abs=1.1068024635314941, max_abs=7.25, mean_rel=0.15989458560943604, max_rel=1455.168212890625, norm_rel=0.025245461612939835, ref_abs_avg=44.08058547973633, test_abs_avg=44.08076095581055
production_forward grad[28] vs paper_forward: mean_abs=1.0215144157409668, max_abs=7.0, mean_rel=0.38576415181159973, max_rel=2812.499755859375, norm_rel=0.02357437275350094, ref_abs_avg=43.533294677734375, test_abs_avg=43.53091812133789
production_forward grad[29] vs paper_forward: mean_abs=0.8094267845153809, max_abs=3.0234375, mean_rel=0.09837557375431061, max_rel=6.7503981590271, norm_rel=0.024195009842514992, ref_abs_avg=33.34807205200195, test_abs_avg=33.38688659667969
production_forward grad[30] vs paper_forward: mean_abs=1.0104986429214478, max_abs=7.5, mean_rel=0.18221193552017212, max_rel=2336.278076171875, norm_rel=0.025616148486733437, ref_abs_avg=39.592552185058594, test_abs_avg=39.591949462890625
production_forward grad[31] vs paper_forward: mean_abs=0.9479422569274902, max_abs=5.5, mean_rel=0.27275800704956055, max_rel=2624.999755859375, norm_rel=0.024262938648462296, ref_abs_avg=39.24968719482422, test_abs_avg=39.25053787231445
production_forward grad[32] vs paper_forward: mean_abs=0.7808780670166016, max_abs=2.75, mean_rel=0.1067766472697258, max_rel=10.132410049438477, norm_rel=0.025236070156097412, ref_abs_avg=31.24566650390625, test_abs_avg=31.24092674255371
production_forward grad[33] vs paper_forward: mean_abs=0.9499373435974121, max_abs=6.25, mean_rel=0.18309248983860016, max_rel=3168.5263671875, norm_rel=0.025456059724092484, ref_abs_avg=37.48176193237305, test_abs_avg=37.482704162597656
production_forward grad[34] vs paper_forward: mean_abs=0.8820422887802124, max_abs=6.0, mean_rel=0.25126975774765015, max_rel=2250.0, norm_rel=0.024075428023934364, ref_abs_avg=36.75504684448242, test_abs_avg=36.75920104980469
production_forward grad[35] vs paper_forward: mean_abs=0.6828813552856445, max_abs=3.375, mean_rel=0.07943090796470642, max_rel=6.089137554168701, norm_rel=0.024848733097314835, ref_abs_avg=28.61974334716797, test_abs_avg=28.623676300048828
production_forward grad[36] vs paper_forward: mean_abs=0.8848560452461243, max_abs=6.0, mean_rel=0.18262185156345367, max_rel=1769.7127685546875, norm_rel=0.025209276005625725, ref_abs_avg=35.24900436401367, test_abs_avg=35.25017166137695
production_forward grad[37] vs paper_forward: mean_abs=0.8319454789161682, max_abs=5.25, mean_rel=0.33991366624832153, max_rel=2906.249755859375, norm_rel=0.023877369239926338, ref_abs_avg=34.92790985107422, test_abs_avg=34.926414489746094
production_forward grad[38] vs paper_forward: mean_abs=0.6633005142211914, max_abs=2.625, mean_rel=0.23980283737182617, max_rel=54.714996337890625, norm_rel=0.0246733408421278, ref_abs_avg=27.149349212646484, test_abs_avg=27.202884674072266
production_forward grad[39] vs paper_forward: mean_abs=0.8358252048492432, max_abs=5.5, mean_rel=0.16335590183734894, max_rel=1222.2479248046875, norm_rel=0.02489917166531086, ref_abs_avg=33.651573181152344, test_abs_avg=33.6540412902832
production_forward grad[40] vs paper_forward: mean_abs=0.7781515717506409, max_abs=5.125, mean_rel=0.3204079270362854, max_rel=3312.499755859375, norm_rel=0.02343527041375637, ref_abs_avg=33.293155670166016, test_abs_avg=33.294464111328125
production_forward grad[41] vs paper_forward: mean_abs=0.6350914239883423, max_abs=2.5, mean_rel=0.9644345641136169, max_rel=418.15673828125, norm_rel=0.023903662338852882, ref_abs_avg=26.457294464111328, test_abs_avg=26.479480743408203
production_forward grad[42] vs paper_forward: mean_abs=0.7933903932571411, max_abs=5.5, mean_rel=0.16341519355773926, max_rel=1419.2113037109375, norm_rel=0.024660944938659668, ref_abs_avg=32.27116394042969, test_abs_avg=32.271202087402344
production_forward grad[43] vs paper_forward: mean_abs=0.7368142604827881, max_abs=4.5, mean_rel=0.26317787170410156, max_rel=2921.874755859375, norm_rel=0.02326001040637493, ref_abs_avg=31.74160385131836, test_abs_avg=31.741897583007812
production_forward grad[44] vs paper_forward: mean_abs=0.5528564453125, max_abs=2.25, mean_rel=0.12386912107467651, max_rel=10.313928604125977, norm_rel=0.02178201451897621, ref_abs_avg=24.250751495361328, test_abs_avg=24.276809692382812
production_forward grad[45] vs paper_forward: mean_abs=0.7624461054801941, max_abs=5.5, mean_rel=0.15914544463157654, max_rel=1144.87744140625, norm_rel=0.02446625754237175, ref_abs_avg=31.186668395996094, test_abs_avg=31.186565399169922
production_forward grad[46] vs paper_forward: mean_abs=0.7062210440635681, max_abs=4.0, mean_rel=0.21725189685821533, max_rel=1999.9998779296875, norm_rel=0.023281333968043327, ref_abs_avg=30.42008399963379, test_abs_avg=30.422168731689453
production_forward grad[47] vs paper_forward: mean_abs=0.5756142139434814, max_abs=2.140625, mean_rel=0.07165013253688812, max_rel=2.497037410736084, norm_rel=0.022826949134469032, ref_abs_avg=25.40735626220703, test_abs_avg=25.40605926513672
production_forward grad[48] vs paper_forward: mean_abs=0.7250147461891174, max_abs=5.0, mean_rel=0.16418468952178955, max_rel=1501.551025390625, norm_rel=0.024268191307783127, ref_abs_avg=29.98481559753418, test_abs_avg=29.987836837768555
production_forward grad[49] vs paper_forward: mean_abs=0.6773777008056641, max_abs=4.5, mean_rel=0.21618258953094482, max_rel=2125.0, norm_rel=0.023096561431884766, ref_abs_avg=29.379653930664062, test_abs_avg=29.38041114807129
production_forward grad[50] vs paper_forward: mean_abs=0.606257438659668, max_abs=2.125, mean_rel=0.11323675513267517, max_rel=6.885772228240967, norm_rel=0.022130897268652916, ref_abs_avg=27.551250457763672, test_abs_avg=27.558134078979492
production_forward grad[51] vs paper_forward: mean_abs=0.7955960631370544, max_abs=5.5, mean_rel=0.16678906977176666, max_rel=889.9237670898438, norm_rel=0.025591738522052765, ref_abs_avg=31.20907974243164, test_abs_avg=31.214969635009766
production_forward grad[52] vs paper_forward: mean_abs=0.7347284555435181, max_abs=4.5625, mean_rel=0.2716931104660034, max_rel=2312.5, norm_rel=0.023983262479305267, ref_abs_avg=30.71146583557129, test_abs_avg=30.720661163330078
production_forward grad[53] vs paper_forward: mean_abs=0.5591707229614258, max_abs=2.0, mean_rel=0.1681184619665146, max_rel=24.93421745300293, norm_rel=0.023132627829909325, ref_abs_avg=23.92996597290039, test_abs_avg=23.914918899536133
production_forward grad[54] vs paper_forward: mean_abs=0.73093181848526, max_abs=5.0, mean_rel=0.16226601600646973, max_rel=1038.4722900390625, norm_rel=0.025578808039426804, ref_abs_avg=28.611303329467773, test_abs_avg=28.610580444335938
production_forward grad[55] vs paper_forward: mean_abs=0.6772396564483643, max_abs=4.4375, mean_rel=0.23100855946540833, max_rel=1828.1248779296875, norm_rel=0.023992178961634636, ref_abs_avg=28.31093978881836, test_abs_avg=28.310440063476562
production_forward grad[56] vs paper_forward: mean_abs=0.5350675582885742, max_abs=2.375, mean_rel=0.10264523327350616, max_rel=9.604915618896484, norm_rel=0.024222251027822495, ref_abs_avg=22.265472412109375, test_abs_avg=22.258024215698242
production_forward grad[57] vs paper_forward: mean_abs=0.6860743761062622, max_abs=5.0, mean_rel=0.1574108600616455, max_rel=722.1414794921875, norm_rel=0.024915795773267746, ref_abs_avg=27.59450340270996, test_abs_avg=27.593730926513672
production_forward grad[58] vs paper_forward: mean_abs=0.6362603306770325, max_abs=4.3125, mean_rel=0.2697475552558899, max_rel=1624.9998779296875, norm_rel=0.023724598810076714, ref_abs_avg=26.897722244262695, test_abs_avg=26.895069122314453
production_forward grad[59] vs paper_forward: mean_abs=0.5150848627090454, max_abs=2.25, mean_rel=0.0975947454571724, max_rel=14.011852264404297, norm_rel=0.023155998438596725, ref_abs_avg=21.93137550354004, test_abs_avg=21.90679931640625
production_forward grad[60] vs paper_forward: mean_abs=0.6470879316329956, max_abs=4.75, mean_rel=0.15577368438243866, max_rel=915.6222534179688, norm_rel=0.02446015365421772, ref_abs_avg=26.508424758911133, test_abs_avg=26.510801315307617
production_forward grad[61] vs paper_forward: mean_abs=0.5968470573425293, max_abs=3.75, mean_rel=0.2487955391407013, max_rel=1499.9998779296875, norm_rel=0.02288397029042244, ref_abs_avg=26.021677017211914, test_abs_avg=26.023221969604492
production_forward grad[62] vs paper_forward: mean_abs=0.47760677337646484, max_abs=1.859375, mean_rel=0.1142503023147583, max_rel=5.677383899688721, norm_rel=0.0247921384871006, ref_abs_avg=19.44915199279785, test_abs_avg=19.453876495361328
production_forward grad[63] vs paper_forward: mean_abs=0.6079224944114685, max_abs=4.75, mean_rel=0.15825283527374268, max_rel=1502.3040771484375, norm_rel=0.024383388459682465, ref_abs_avg=24.977846145629883, test_abs_avg=24.978168487548828
production_forward grad[64] vs paper_forward: mean_abs=0.5692800283432007, max_abs=4.0, mean_rel=0.24332433938980103, max_rel=1874.9998779296875, norm_rel=0.02308126911520958, ref_abs_avg=24.666725158691406, test_abs_avg=24.67159080505371
production_forward grad[65] vs paper_forward: mean_abs=0.4634523391723633, max_abs=1.75, mean_rel=0.08130397647619247, max_rel=3.6799299716949463, norm_rel=0.023072149604558945, ref_abs_avg=20.087509155273438, test_abs_avg=20.088150024414062
production_forward grad[66] vs paper_forward: mean_abs=0.5760523676872253, max_abs=5.0, mean_rel=0.15328659117221832, max_rel=1389.135498046875, norm_rel=0.023893136531114578, ref_abs_avg=24.13117218017578, test_abs_avg=24.1315975189209
production_forward grad[67] vs paper_forward: mean_abs=0.5382272005081177, max_abs=3.8125, mean_rel=0.2244316190481186, max_rel=1312.4998779296875, norm_rel=0.022205568850040436, ref_abs_avg=24.204345703125, test_abs_avg=24.207374572753906
production_forward grad[68] vs paper_forward: mean_abs=0.435732364654541, max_abs=1.5625, mean_rel=0.0557175949215889, max_rel=1.867558479309082, norm_rel=0.022884339094161987, ref_abs_avg=19.504507064819336, test_abs_avg=19.511531829833984
production_forward grad[69] vs paper_forward: mean_abs=0.5544405579566956, max_abs=4.5, mean_rel=0.1435585916042328, max_rel=690.5640869140625, norm_rel=0.02360098995268345, ref_abs_avg=23.531551361083984, test_abs_avg=23.532573699951172
production_forward grad[70] vs paper_forward: mean_abs=0.5130298137664795, max_abs=4.0, mean_rel=0.20065709948539734, max_rel=1374.9998779296875, norm_rel=0.021710556000471115, ref_abs_avg=23.54145622253418, test_abs_avg=23.554500579833984
production_forward grad[71] vs paper_forward: mean_abs=0.42165207862854004, max_abs=1.625, mean_rel=0.1293402761220932, max_rel=10.02469253540039, norm_rel=0.02262311987578869, ref_abs_avg=18.484272003173828, test_abs_avg=18.48331069946289
production_forward grad[72] vs paper_forward: mean_abs=0.5319356322288513, max_abs=6.0, mean_rel=0.1443827748298645, max_rel=869.0838623046875, norm_rel=0.023444807156920433, ref_abs_avg=22.723976135253906, test_abs_avg=22.724365234375
production_forward grad[73] vs paper_forward: mean_abs=0.4880561828613281, max_abs=3.5, mean_rel=0.231489896774292, max_rel=1749.9998779296875, norm_rel=0.021587783470749855, ref_abs_avg=22.546998977661133, test_abs_avg=22.554588317871094
production_forward grad[74] vs paper_forward: mean_abs=0.48259544372558594, max_abs=1.75, mean_rel=0.06520247459411621, max_rel=1.875496745109558, norm_rel=0.022356318309903145, ref_abs_avg=21.934680938720703, test_abs_avg=21.945154190063477
production_forward grad[75] vs paper_forward: mean_abs=0.6034997701644897, max_abs=4.75, mean_rel=0.1585489809513092, max_rel=1257.49072265625, norm_rel=0.02478927932679653, ref_abs_avg=24.363048553466797, test_abs_avg=24.36263656616211
production_forward grad[76] vs paper_forward: mean_abs=0.5530796647071838, max_abs=4.5, mean_rel=0.24674659967422485, max_rel=2781.249755859375, norm_rel=0.023301219567656517, ref_abs_avg=23.76412582397461, test_abs_avg=23.75763511657715
production_forward grad[77] vs paper_forward: mean_abs=0.44423091411590576, max_abs=1.8125, mean_rel=0.23064231872558594, max_rel=62.646690368652344, norm_rel=0.023695237934589386, ref_abs_avg=18.479549407958984, test_abs_avg=18.451358795166016
production_forward grad[78] vs paper_forward: mean_abs=0.5483517646789551, max_abs=5.0, mean_rel=0.15550491213798523, max_rel=1519.5843505859375, norm_rel=0.024132348597049713, ref_abs_avg=22.761932373046875, test_abs_avg=22.762367248535156
production_forward grad[79] vs paper_forward: mean_abs=0.505400538444519, max_abs=3.5, mean_rel=0.23693397641181946, max_rel=1593.7498779296875, norm_rel=0.022557493299245834, ref_abs_avg=22.436784744262695, test_abs_avg=22.43577003479004
production_forward grad[80] vs paper_forward: mean_abs=0.3935379981994629, max_abs=1.5, mean_rel=0.06271316111087799, max_rel=2.564415693283081, norm_rel=0.02061363495886326, ref_abs_avg=18.98015022277832, test_abs_avg=18.983638763427734
production_forward grad[81] vs paper_forward: mean_abs=0.5070575475692749, max_abs=4.0625, mean_rel=0.15327081084251404, max_rel=948.6092529296875, norm_rel=0.023563357070088387, ref_abs_avg=21.620914459228516, test_abs_avg=21.623111724853516
production_forward grad[82] vs paper_forward: mean_abs=0.46613800525665283, max_abs=3.75, mean_rel=0.2030513882637024, max_rel=1578.1248779296875, norm_rel=0.021930130198597908, ref_abs_avg=21.226125717163086, test_abs_avg=21.22831916809082
production_forward grad[83] vs paper_forward: mean_abs=0.3681432008743286, max_abs=1.448974609375, mean_rel=0.19663098454475403, max_rel=31.914533615112305, norm_rel=0.021494489163160324, ref_abs_avg=17.326868057250977, test_abs_avg=17.33685302734375
production_forward grad[84] vs paper_forward: mean_abs=0.4765201807022095, max_abs=5.5, mean_rel=0.14851748943328857, max_rel=1155.175048828125, norm_rel=0.022869979962706566, ref_abs_avg=20.899038314819336, test_abs_avg=20.900386810302734
production_forward grad[85] vs paper_forward: mean_abs=0.432553768157959, max_abs=4.0, mean_rel=0.18643519282341003, max_rel=1156.25, norm_rel=0.021510720252990723, ref_abs_avg=20.195497512817383, test_abs_avg=20.192420959472656
production_forward grad[86] vs paper_forward: mean_abs=0.35274994373321533, max_abs=1.625, mean_rel=0.13581612706184387, max_rel=24.13167381286621, norm_rel=0.02135009504854679, ref_abs_avg=16.553306579589844, test_abs_avg=16.541454315185547
production_forward grad[87] vs paper_forward: mean_abs=0.44888031482696533, max_abs=4.5, mean_rel=0.13970865309238434, max_rel=779.7160034179688, norm_rel=0.022396229207515717, ref_abs_avg=20.133056640625, test_abs_avg=20.133647918701172
production_forward grad[88] vs paper_forward: mean_abs=0.4046768248081207, max_abs=3.5, mean_rel=0.19926869869232178, max_rel=1624.9998779296875, norm_rel=0.020471002906560898, ref_abs_avg=19.8834228515625, test_abs_avg=19.883668899536133
production_forward grad[89] vs paper_forward: mean_abs=0.332291841506958, max_abs=1.5, mean_rel=0.149336040019989, max_rel=21.55302619934082, norm_rel=0.02037326991558075, ref_abs_avg=16.058557510375977, test_abs_avg=16.05337142944336
production_forward grad[90] vs paper_forward: mean_abs=0.4218178987503052, max_abs=4.0, mean_rel=0.1356721967458725, max_rel=626.9374389648438, norm_rel=0.021977800875902176, ref_abs_avg=19.372718811035156, test_abs_avg=19.372610092163086
production_forward grad[91] vs paper_forward: mean_abs=0.38003456592559814, max_abs=3.5, mean_rel=0.18453997373580933, max_rel=1093.75, norm_rel=0.01991378888487816, ref_abs_avg=19.21910858154297, test_abs_avg=19.213516235351562
production_forward grad[92] vs paper_forward: mean_abs=0.299314022064209, max_abs=1.0625, mean_rel=0.08620712161064148, max_rel=7.419588088989258, norm_rel=0.018580153584480286, ref_abs_avg=16.111005783081055, test_abs_avg=16.103511810302734
production_forward grad[93] vs paper_forward: mean_abs=0.38832366466522217, max_abs=5.0, mean_rel=0.12797734141349792, max_rel=393.9986572265625, norm_rel=0.021291442215442657, ref_abs_avg=18.448062896728516, test_abs_avg=18.450037002563477
production_forward grad[94] vs paper_forward: mean_abs=0.35202479362487793, max_abs=4.046875, mean_rel=0.17582866549491882, max_rel=1187.5, norm_rel=0.01920785941183567, ref_abs_avg=18.44879722595215, test_abs_avg=18.455570220947266
production_forward grad[95] vs paper_forward: mean_abs=0.29233837127685547, max_abs=1.25, mean_rel=0.09463615715503693, max_rel=12.931034088134766, norm_rel=0.01899738237261772, ref_abs_avg=15.597412109375, test_abs_avg=15.589521408081055
production_forward grad[96] vs paper_forward: mean_abs=0.38065898418426514, max_abs=4.1875, mean_rel=0.13215301930904388, max_rel=823.8925170898438, norm_rel=0.020917808637022972, ref_abs_avg=18.488773345947266, test_abs_avg=18.48745346069336
production_forward grad[97] vs paper_forward: mean_abs=0.33574092388153076, max_abs=5.0, mean_rel=0.17892596125602722, max_rel=1093.75, norm_rel=0.018954487517476082, ref_abs_avg=17.992095947265625, test_abs_avg=17.987842559814453
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016634889179840684, max_abs=0.0391845703125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008734133094549179, max_abs=0.375, mean_rel=0.07472285628318787, max_rel=101.0697021484375, norm_rel=0.02049100771546364, ref_abs_avg=0.4613272547721863, test_abs_avg=0.46132680773735046
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.446099758148193, max_abs=52.0, mean_rel=0.13486836850643158, max_rel=189.29164123535156, norm_rel=0.02047961764037609, ref_abs_avg=323.8179016113281, test_abs_avg=323.841796875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2453070878982544, max_abs=4.9375, mean_rel=0.14480426907539368, max_rel=25.272293090820312, norm_rel=0.02351217344403267, ref_abs_avg=52.27435302734375, test_abs_avg=52.24980163574219
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6292117834091187, max_abs=11.0, mean_rel=0.17108555138111115, max_rel=2881.6982421875, norm_rel=0.025270085781812668, ref_abs_avg=64.87458038330078, test_abs_avg=64.87638092041016
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5059648752212524, max_abs=9.75, mean_rel=0.44036048650741577, max_rel=4500.0, norm_rel=0.023795245215296745, ref_abs_avg=63.66494369506836, test_abs_avg=63.667335510253906
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1021013259887695, max_abs=4.5, mean_rel=0.09839418530464172, max_rel=17.450490951538086, norm_rel=0.02437051758170128, ref_abs_avg=46.364498138427734, test_abs_avg=46.359222412109375
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4449491500854492, max_abs=10.0, mean_rel=0.1665550023317337, max_rel=739.9090576171875, norm_rel=0.024970335885882378, ref_abs_avg=58.222412109375, test_abs_avg=58.224754333496094
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.341908574104309, max_abs=8.875, mean_rel=0.445230096578598, max_rel=5249.99951171875, norm_rel=0.023402439430356026, ref_abs_avg=57.59020233154297, test_abs_avg=57.58403015136719
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0347003936767578, max_abs=4.625, mean_rel=0.08934123814105988, max_rel=6.7259721755981445, norm_rel=0.02364632487297058, ref_abs_avg=44.22251892089844, test_abs_avg=44.215049743652344
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3244853019714355, max_abs=10.0, mean_rel=0.1909458339214325, max_rel=4368.66748046875, norm_rel=0.024843567982316017, ref_abs_avg=53.65225601196289, test_abs_avg=53.65435791015625
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2189171314239502, max_abs=8.0, mean_rel=0.3622362017631531, max_rel=3687.499755859375, norm_rel=0.02334488183259964, ref_abs_avg=52.507110595703125, test_abs_avg=52.50020217895508
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9318835735321045, max_abs=3.5, mean_rel=1.032238483428955, max_rel=481.4537353515625, norm_rel=0.022987864911556244, ref_abs_avg=40.313507080078125, test_abs_avg=40.35823059082031
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2166354656219482, max_abs=8.53125, mean_rel=0.17858189344406128, max_rel=2188.819580078125, norm_rel=0.024713462218642235, ref_abs_avg=49.50634765625, test_abs_avg=49.508262634277344
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1292698383331299, max_abs=7.0, mean_rel=0.3049817681312561, max_rel=4250.0, norm_rel=0.023060057312250137, ref_abs_avg=49.221893310546875, test_abs_avg=49.22304153442383
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8951654434204102, max_abs=3.625, mean_rel=0.10130299627780914, max_rel=7.071581840515137, norm_rel=0.025320542976260185, ref_abs_avg=34.59674835205078, test_abs_avg=34.74034881591797
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1356045007705688, max_abs=10.0, mean_rel=0.17511501908302307, max_rel=1880.611083984375, norm_rel=0.024427002295851707, ref_abs_avg=46.76941680908203, test_abs_avg=46.771942138671875
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.046342134475708, max_abs=7.0, mean_rel=0.24649016559123993, max_rel=2500.0, norm_rel=0.022857513278722763, ref_abs_avg=46.01008224487305, test_abs_avg=46.00972366333008
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8435053825378418, max_abs=3.625, mean_rel=0.12251110374927521, max_rel=9.508139610290527, norm_rel=0.022400271147489548, ref_abs_avg=36.993980407714844, test_abs_avg=36.956214904785156
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0790654420852661, max_abs=7.0, mean_rel=0.17060023546218872, max_rel=1430.638916015625, norm_rel=0.024306796491146088, ref_abs_avg=44.61968994140625, test_abs_avg=44.62203598022461
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9857763051986694, max_abs=5.75, mean_rel=0.34509381651878357, max_rel=3812.499755859375, norm_rel=0.022470196709036827, ref_abs_avg=44.02014923095703, test_abs_avg=44.01790237426758
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7811343669891357, max_abs=3.72265625, mean_rel=0.07441912591457367, max_rel=5.200375556945801, norm_rel=0.02249063178896904, ref_abs_avg=34.92405700683594, test_abs_avg=34.91814422607422
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0145448446273804, max_abs=7.5, mean_rel=0.1674211323261261, max_rel=1697.169921875, norm_rel=0.024193713441491127, ref_abs_avg=42.16720199584961, test_abs_avg=42.17009353637695
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.933580756187439, max_abs=6.25, mean_rel=0.3207702934741974, max_rel=2562.5, norm_rel=0.022498158738017082, ref_abs_avg=41.691978454589844, test_abs_avg=41.69404602050781
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7758934497833252, max_abs=3.25, mean_rel=0.12299758940935135, max_rel=17.388877868652344, norm_rel=0.024816229939460754, ref_abs_avg=32.034175872802734, test_abs_avg=31.967056274414062
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9641501903533936, max_abs=7.0, mean_rel=0.16631177067756653, max_rel=1495.328125, norm_rel=0.02391555719077587, ref_abs_avg=40.562591552734375, test_abs_avg=40.56346893310547
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8889813423156738, max_abs=5.5, mean_rel=0.2854258716106415, max_rel=2140.625, norm_rel=0.02246253378689289, ref_abs_avg=39.797645568847656, test_abs_avg=39.800106048583984
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8860718607902527, max_abs=3.25, mean_rel=0.2730235457420349, max_rel=30.371421813964844, norm_rel=0.02516397088766098, ref_abs_avg=34.44279479980469, test_abs_avg=34.475135803222656
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1310700178146362, max_abs=8.0, mean_rel=0.16667112708091736, max_rel=1588.2371826171875, norm_rel=0.02579682320356369, ref_abs_avg=44.08058547973633, test_abs_avg=44.079368591308594
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.053438663482666, max_abs=6.75, mean_rel=0.35705697536468506, max_rel=2453.125, norm_rel=0.024321865290403366, ref_abs_avg=43.533294677734375, test_abs_avg=43.526676177978516
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8065322637557983, max_abs=3.0, mean_rel=0.09002899378538132, max_rel=3.7058310508728027, norm_rel=0.024136599153280258, ref_abs_avg=33.34807205200195, test_abs_avg=33.40613555908203
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0281665325164795, max_abs=9.0, mean_rel=0.19126847386360168, max_rel=2407.077392578125, norm_rel=0.02605636790394783, ref_abs_avg=39.592552185058594, test_abs_avg=39.59355926513672
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9664753675460815, max_abs=6.375, mean_rel=0.26703953742980957, max_rel=2812.499755859375, norm_rel=0.024737492203712463, ref_abs_avg=39.24968719482422, test_abs_avg=39.252685546875
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7991256713867188, max_abs=3.125, mean_rel=0.11641456186771393, max_rel=11.87995719909668, norm_rel=0.02576066553592682, ref_abs_avg=31.24566650390625, test_abs_avg=31.26586151123047
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9665147066116333, max_abs=7.0, mean_rel=0.1902821958065033, max_rel=2972.239501953125, norm_rel=0.02589002065360546, ref_abs_avg=37.48176193237305, test_abs_avg=37.482215881347656
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9006208181381226, max_abs=6.0, mean_rel=0.2614907920360565, max_rel=2250.0, norm_rel=0.024566181004047394, ref_abs_avg=36.75504684448242, test_abs_avg=36.75798034667969
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7018880844116211, max_abs=2.9375, mean_rel=0.07763373851776123, max_rel=7.484687805175781, norm_rel=0.02488238550722599, ref_abs_avg=28.61974334716797, test_abs_avg=28.59516143798828
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.900108814239502, max_abs=6.5, mean_rel=0.18529915809631348, max_rel=1696.4923095703125, norm_rel=0.025643009692430496, ref_abs_avg=35.24900436401367, test_abs_avg=35.250083923339844
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.849288821220398, max_abs=5.25, mean_rel=0.36419808864593506, max_rel=3312.499755859375, norm_rel=0.02438586950302124, ref_abs_avg=34.92790985107422, test_abs_avg=34.927486419677734
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6602952480316162, max_abs=2.75, mean_rel=0.1859632283449173, max_rel=35.485877990722656, norm_rel=0.024131493642926216, ref_abs_avg=27.149349212646484, test_abs_avg=27.176532745361328
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8495008945465088, max_abs=5.75, mean_rel=0.16560769081115723, max_rel=1503.0821533203125, norm_rel=0.025308940559625626, ref_abs_avg=33.651573181152344, test_abs_avg=33.65360641479492
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7909946441650391, max_abs=5.0, mean_rel=0.35853660106658936, max_rel=3249.999755859375, norm_rel=0.023808401077985764, ref_abs_avg=33.293155670166016, test_abs_avg=33.29225158691406
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6338821649551392, max_abs=2.6875, mean_rel=0.6846066117286682, max_rel=257.3216552734375, norm_rel=0.02377481572329998, ref_abs_avg=26.457294464111328, test_abs_avg=26.530029296875
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8042739033699036, max_abs=6.0, mean_rel=0.15873530507087708, max_rel=1438.5255126953125, norm_rel=0.0250016488134861, ref_abs_avg=32.27116394042969, test_abs_avg=32.27154541015625
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7499828338623047, max_abs=4.5, mean_rel=0.2683505117893219, max_rel=2562.5, norm_rel=0.023680664598941803, ref_abs_avg=31.74160385131836, test_abs_avg=31.74127960205078
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5843892097473145, max_abs=2.5, mean_rel=0.14700520038604736, max_rel=14.04257869720459, norm_rel=0.02339695394039154, ref_abs_avg=24.250751495361328, test_abs_avg=24.243927001953125
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7726919651031494, max_abs=5.375, mean_rel=0.1585964858531952, max_rel=1042.9302978515625, norm_rel=0.024793505668640137, ref_abs_avg=31.186668395996094, test_abs_avg=31.186161041259766
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7146143913269043, max_abs=4.5, mean_rel=0.2298891842365265, max_rel=1937.4998779296875, norm_rel=0.023557810112833977, ref_abs_avg=30.42008399963379, test_abs_avg=30.42300796508789
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5585519075393677, max_abs=2.25, mean_rel=0.06980554759502411, max_rel=3.258845329284668, norm_rel=0.02240709215402603, ref_abs_avg=25.40735626220703, test_abs_avg=25.400789260864258
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7344101667404175, max_abs=6.0, mean_rel=0.16439463198184967, max_rel=1585.69921875, norm_rel=0.02457268536090851, ref_abs_avg=29.98481559753418, test_abs_avg=29.9879150390625
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6847385168075562, max_abs=4.25, mean_rel=0.23902229964733124, max_rel=1812.4998779296875, norm_rel=0.023349381983280182, ref_abs_avg=29.379653930664062, test_abs_avg=29.378231048583984
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6063013076782227, max_abs=2.25, mean_rel=0.10843785107135773, max_rel=6.773121356964111, norm_rel=0.02189081721007824, ref_abs_avg=27.551250457763672, test_abs_avg=27.530418395996094
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.809276282787323, max_abs=5.59375, mean_rel=0.16913852095603943, max_rel=1016.9989624023438, norm_rel=0.026030097156763077, ref_abs_avg=31.20907974243164, test_abs_avg=31.213645935058594
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7493323087692261, max_abs=5.21875, mean_rel=0.2895253896713257, max_rel=2250.0, norm_rel=0.024448944255709648, ref_abs_avg=30.71146583557129, test_abs_avg=30.716781616210938
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5799756050109863, max_abs=2.1796875, mean_rel=0.21180568635463715, max_rel=41.12691116333008, norm_rel=0.02389153279364109, ref_abs_avg=23.92996597290039, test_abs_avg=23.90974998474121
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7400729656219482, max_abs=5.0, mean_rel=0.16504338383674622, max_rel=803.5713500976562, norm_rel=0.02591162733733654, ref_abs_avg=28.611303329467773, test_abs_avg=28.61054229736328
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6901686191558838, max_abs=4.875, mean_rel=0.233370840549469, max_rel=1999.9998779296875, norm_rel=0.02444346435368061, ref_abs_avg=28.31093978881836, test_abs_avg=28.308557510375977
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5557870864868164, max_abs=2.125, mean_rel=0.11338958144187927, max_rel=10.903018951416016, norm_rel=0.0250387005507946, ref_abs_avg=22.265472412109375, test_abs_avg=22.23272705078125
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6958498358726501, max_abs=5.0, mean_rel=0.1603366732597351, max_rel=908.44384765625, norm_rel=0.0252645555883646, ref_abs_avg=27.59450340270996, test_abs_avg=27.593692779541016
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6447848677635193, max_abs=4.625, mean_rel=0.2614205777645111, max_rel=1843.7498779296875, norm_rel=0.024029787629842758, ref_abs_avg=26.897722244262695, test_abs_avg=26.893722534179688
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5230729579925537, max_abs=2.0, mean_rel=0.1072545275092125, max_rel=18.581323623657227, norm_rel=0.0234677754342556, ref_abs_avg=21.93137550354004, test_abs_avg=21.926883697509766
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6546964645385742, max_abs=5.0, mean_rel=0.16225433349609375, max_rel=743.91748046875, norm_rel=0.024731682613492012, ref_abs_avg=26.508424758911133, test_abs_avg=26.509811401367188
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6052945852279663, max_abs=4.0, mean_rel=0.24966813623905182, max_rel=1468.7498779296875, norm_rel=0.02321821264922619, ref_abs_avg=26.021677017211914, test_abs_avg=26.02345085144043
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.5012292861938477, max_abs=1.75, mean_rel=0.11754143238067627, max_rel=5.9505228996276855, norm_rel=0.025625158101320267, ref_abs_avg=19.44915199279785, test_abs_avg=19.460254669189453
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.614750325679779, max_abs=4.5, mean_rel=0.16102555394172668, max_rel=1089.3712158203125, norm_rel=0.02464909665286541, ref_abs_avg=24.977846145629883, test_abs_avg=24.97820281982422
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5727574825286865, max_abs=4.125, mean_rel=0.23835012316703796, max_rel=1374.9998779296875, norm_rel=0.023228077217936516, ref_abs_avg=24.666725158691406, test_abs_avg=24.671737670898438
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4522819519042969, max_abs=2.0, mean_rel=0.09452014416456223, max_rel=8.351256370544434, norm_rel=0.022818489000201225, ref_abs_avg=20.087509155273438, test_abs_avg=20.098953247070312
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5820902585983276, max_abs=5.0, mean_rel=0.1560833603143692, max_rel=1256.283447265625, norm_rel=0.02412823773920536, ref_abs_avg=24.13117218017578, test_abs_avg=24.13178253173828
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5439125299453735, max_abs=3.875, mean_rel=0.2446037232875824, max_rel=1359.3748779296875, norm_rel=0.02244645543396473, ref_abs_avg=24.204345703125, test_abs_avg=24.20578956604004
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4308624267578125, max_abs=1.625, mean_rel=0.057604316622018814, max_rel=3.097273826599121, norm_rel=0.0228804312646389, ref_abs_avg=19.504507064819336, test_abs_avg=19.500398635864258
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5590363144874573, max_abs=4.5, mean_rel=0.14340586960315704, max_rel=584.5978393554688, norm_rel=0.0237896665930748, ref_abs_avg=23.531551361083984, test_abs_avg=23.532058715820312
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5198649168014526, max_abs=3.75, mean_rel=0.20919311046600342, max_rel=1062.5, norm_rel=0.021993981674313545, ref_abs_avg=23.54145622253418, test_abs_avg=23.548858642578125
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4319145679473877, max_abs=1.75, mean_rel=0.10608911514282227, max_rel=4.441915988922119, norm_rel=0.02312113158404827, ref_abs_avg=18.484272003173828, test_abs_avg=18.45594024658203
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5367529392242432, max_abs=4.5, mean_rel=0.14335188269615173, max_rel=803.6228637695312, norm_rel=0.023637808859348297, ref_abs_avg=22.723976135253906, test_abs_avg=22.723569869995117
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4970758557319641, max_abs=3.75, mean_rel=0.21270683407783508, max_rel=1468.7498779296875, norm_rel=0.021971790120005608, ref_abs_avg=22.546998977661133, test_abs_avg=22.555545806884766
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4855813980102539, max_abs=2.03125, mean_rel=0.07481397688388824, max_rel=3.825516700744629, norm_rel=0.02226807363331318, ref_abs_avg=21.934680938720703, test_abs_avg=21.959346771240234
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6107701063156128, max_abs=4.5, mean_rel=0.16365113854408264, max_rel=1125.919921875, norm_rel=0.025063106790184975, ref_abs_avg=24.363048553466797, test_abs_avg=24.362451553344727
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5574178695678711, max_abs=5.0, mean_rel=0.2460239827632904, max_rel=2406.25, norm_rel=0.023503074422478676, ref_abs_avg=23.76412582397461, test_abs_avg=23.76081085205078
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.44537317752838135, max_abs=1.75, mean_rel=0.21698717772960663, max_rel=39.697750091552734, norm_rel=0.02400897443294525, ref_abs_avg=18.479549407958984, test_abs_avg=18.417072296142578
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5529478192329407, max_abs=5.0, mean_rel=0.15345564484596252, max_rel=1207.4952392578125, norm_rel=0.024292990565299988, ref_abs_avg=22.761932373046875, test_abs_avg=22.761768341064453
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.51422518491745, max_abs=4.1875, mean_rel=0.24310290813446045, max_rel=1499.9998779296875, norm_rel=0.022937877103686333, ref_abs_avg=22.436784744262695, test_abs_avg=22.435453414916992
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.39756011962890625, max_abs=1.375, mean_rel=0.07227230072021484, max_rel=2.5931670665740967, norm_rel=0.02072560042142868, ref_abs_avg=18.98015022277832, test_abs_avg=18.996274948120117
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5128498077392578, max_abs=5.0, mean_rel=0.1537514328956604, max_rel=816.3563842773438, norm_rel=0.023790499195456505, ref_abs_avg=21.620914459228516, test_abs_avg=21.622886657714844
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.47206366062164307, max_abs=4.0, mean_rel=0.20947574079036713, max_rel=1374.9998779296875, norm_rel=0.022204305976629257, ref_abs_avg=21.226125717163086, test_abs_avg=21.226707458496094
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3721129894256592, max_abs=1.5, mean_rel=0.2133973091840744, max_rel=30.28917694091797, norm_rel=0.02158663049340248, ref_abs_avg=17.326868057250977, test_abs_avg=17.339317321777344
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4810541272163391, max_abs=6.5, mean_rel=0.15146717429161072, max_rel=1718.35986328125, norm_rel=0.023084694519639015, ref_abs_avg=20.899038314819336, test_abs_avg=20.899621963500977
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.437183141708374, max_abs=5.0, mean_rel=0.18825756013393402, max_rel=1257.8125, norm_rel=0.021748024970293045, ref_abs_avg=20.195497512817383, test_abs_avg=20.195823669433594
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3602548837661743, max_abs=1.375, mean_rel=0.14368177950382233, max_rel=33.544063568115234, norm_rel=0.021495336666703224, ref_abs_avg=16.553306579589844, test_abs_avg=16.548742294311523
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4515106678009033, max_abs=4.5439453125, mean_rel=0.14038890600204468, max_rel=839.0284423828125, norm_rel=0.022526275366544724, ref_abs_avg=20.133056640625, test_abs_avg=20.13385772705078
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4028495252132416, max_abs=3.75, mean_rel=0.19730082154273987, max_rel=1687.4998779296875, norm_rel=0.020319608971476555, ref_abs_avg=19.8834228515625, test_abs_avg=19.880977630615234
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.32718586921691895, max_abs=1.375, mean_rel=0.1607169657945633, max_rel=24.368934631347656, norm_rel=0.02009865827858448, ref_abs_avg=16.058557510375977, test_abs_avg=16.051908493041992
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4243990182876587, max_abs=4.375, mean_rel=0.13579018414020538, max_rel=814.9667358398438, norm_rel=0.022100411355495453, ref_abs_avg=19.372718811035156, test_abs_avg=19.372447967529297
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3828628659248352, max_abs=3.1875, mean_rel=0.18235638737678528, max_rel=1093.75, norm_rel=0.02004707045853138, ref_abs_avg=19.21910858154297, test_abs_avg=19.21891975402832
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30113351345062256, max_abs=1.25, mean_rel=0.10439141094684601, max_rel=14.437905311584473, norm_rel=0.018949806690216064, ref_abs_avg=16.111005783081055, test_abs_avg=16.109027862548828
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.38927921652793884, max_abs=5.0, mean_rel=0.1281842589378357, max_rel=681.9171142578125, norm_rel=0.02133379504084587, ref_abs_avg=18.448062896728516, test_abs_avg=18.449386596679688
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3554353713989258, max_abs=3.5, mean_rel=0.16817797720432281, max_rel=968.7499389648438, norm_rel=0.019415097311139107, ref_abs_avg=18.44879722595215, test_abs_avg=18.451953887939453
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3012723922729492, max_abs=1.25, mean_rel=0.08287359774112701, max_rel=8.17479133605957, norm_rel=0.019957613199949265, ref_abs_avg=15.597412109375, test_abs_avg=15.591676712036133
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.38059738278388977, max_abs=5.0, mean_rel=0.1329418271780014, max_rel=948.15478515625, norm_rel=0.02092875726521015, ref_abs_avg=18.488773345947266, test_abs_avg=18.487567901611328
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3367406725883484, max_abs=6.0, mean_rel=0.17536085844039917, max_rel=843.7499389648438, norm_rel=0.019022522494196892, ref_abs_avg=17.992095947265625, test_abs_avg=17.986919403076172
production_forward2 vs paper_forward output: mean_abs=0.0016602205578237772, max_abs=0.0391845703125
production_forward2 grad[0] vs paper_forward: mean_abs=0.008724354207515717, max_abs=0.40625, mean_rel=0.07462328672409058, max_rel=116.61962890625, norm_rel=0.020474400371313095, ref_abs_avg=0.4613272547721863, test_abs_avg=0.46132320165634155
production_forward2 grad[1] vs paper_forward: mean_abs=7.461524963378906, max_abs=60.0, mean_rel=0.13034874200820923, max_rel=192.6152801513672, norm_rel=0.02052334137260914, ref_abs_avg=323.8179016113281, test_abs_avg=323.7970886230469
production_forward2 grad[2] vs paper_forward: mean_abs=1.218669056892395, max_abs=5.5, mean_rel=0.16187940537929535, max_rel=22.42327117919922, norm_rel=0.0234778244048357, ref_abs_avg=52.27435302734375, test_abs_avg=52.26420211791992
production_forward2 grad[3] vs paper_forward: mean_abs=1.6283719539642334, max_abs=11.5, mean_rel=0.1708928644657135, max_rel=3895.32275390625, norm_rel=0.025262756273150444, ref_abs_avg=64.87458038330078, test_abs_avg=64.87400817871094
production_forward2 grad[4] vs paper_forward: mean_abs=1.5076494216918945, max_abs=9.375, mean_rel=0.4181581735610962, max_rel=4125.0, norm_rel=0.023810965940356255, ref_abs_avg=63.66494369506836, test_abs_avg=63.66542053222656
production_forward2 grad[5] vs paper_forward: mean_abs=1.124509334564209, max_abs=5.0, mean_rel=0.09493744373321533, max_rel=13.818063735961914, norm_rel=0.024440791457891464, ref_abs_avg=46.364498138427734, test_abs_avg=46.38188171386719
production_forward2 grad[6] vs paper_forward: mean_abs=1.4460445642471313, max_abs=10.0, mean_rel=0.17324358224868774, max_rel=1929.54833984375, norm_rel=0.024997392669320107, ref_abs_avg=58.222412109375, test_abs_avg=58.22317123413086
production_forward2 grad[7] vs paper_forward: mean_abs=1.3470453023910522, max_abs=9.5, mean_rel=0.43444573879241943, max_rel=5000.0, norm_rel=0.02348320744931698, ref_abs_avg=57.59020233154297, test_abs_avg=57.581138610839844
production_forward2 grad[8] vs paper_forward: mean_abs=1.047842025756836, max_abs=4.75, mean_rel=0.09329831600189209, max_rel=5.901331901550293, norm_rel=0.0233693215996027, ref_abs_avg=44.22251892089844, test_abs_avg=44.20858383178711
production_forward2 grad[9] vs paper_forward: mean_abs=1.3225123882293701, max_abs=10.0, mean_rel=0.19051486253738403, max_rel=4231.2841796875, norm_rel=0.024823438376188278, ref_abs_avg=53.65225601196289, test_abs_avg=53.65397644042969
production_forward2 grad[10] vs paper_forward: mean_abs=1.2193410396575928, max_abs=7.65625, mean_rel=0.3720206022262573, max_rel=3687.499755859375, norm_rel=0.023343201726675034, ref_abs_avg=52.507110595703125, test_abs_avg=52.4995002746582
production_forward2 grad[11] vs paper_forward: mean_abs=0.9202024936676025, max_abs=3.75, mean_rel=0.19628824293613434, max_rel=55.16433334350586, norm_rel=0.022971563041210175, ref_abs_avg=40.313507080078125, test_abs_avg=40.354164123535156
production_forward2 grad[12] vs paper_forward: mean_abs=1.2146871089935303, max_abs=8.0, mean_rel=0.17191201448440552, max_rel=1755.16064453125, norm_rel=0.024684065952897072, ref_abs_avg=49.50634765625, test_abs_avg=49.506324768066406
production_forward2 grad[13] vs paper_forward: mean_abs=1.1286662817001343, max_abs=7.0, mean_rel=0.3017864525318146, max_rel=2999.999755859375, norm_rel=0.023037148639559746, ref_abs_avg=49.221893310546875, test_abs_avg=49.221134185791016
production_forward2 grad[14] vs paper_forward: mean_abs=0.9183206558227539, max_abs=3.25, mean_rel=0.0996873527765274, max_rel=6.081973075866699, norm_rel=0.026035018265247345, ref_abs_avg=34.59674835205078, test_abs_avg=34.75101089477539
production_forward2 grad[15] vs paper_forward: mean_abs=1.1375045776367188, max_abs=8.0, mean_rel=0.17394185066223145, max_rel=1915.0174560546875, norm_rel=0.024454040452837944, ref_abs_avg=46.76941680908203, test_abs_avg=46.77375411987305
production_forward2 grad[16] vs paper_forward: mean_abs=1.0432500839233398, max_abs=6.0, mean_rel=0.2711464464664459, max_rel=2593.749755859375, norm_rel=0.022782325744628906, ref_abs_avg=46.01008224487305, test_abs_avg=46.00796127319336
production_forward2 grad[17] vs paper_forward: mean_abs=0.8310893177986145, max_abs=3.625, mean_rel=0.12753577530384064, max_rel=16.151023864746094, norm_rel=0.021941332146525383, ref_abs_avg=36.993980407714844, test_abs_avg=36.92021942138672
production_forward2 grad[18] vs paper_forward: mean_abs=1.078060269355774, max_abs=8.0, mean_rel=0.16927160322666168, max_rel=2081.040771484375, norm_rel=0.024302177131175995, ref_abs_avg=44.61968994140625, test_abs_avg=44.619384765625
production_forward2 grad[19] vs paper_forward: mean_abs=0.9859492778778076, max_abs=5.625, mean_rel=0.3384166657924652, max_rel=3062.499755859375, norm_rel=0.022476600483059883, ref_abs_avg=44.02014923095703, test_abs_avg=44.01610565185547
production_forward2 grad[20] vs paper_forward: mean_abs=0.7964038848876953, max_abs=3.81640625, mean_rel=0.07651063054800034, max_rel=5.3313398361206055, norm_rel=0.022564152255654335, ref_abs_avg=34.92405700683594, test_abs_avg=34.9324951171875
production_forward2 grad[21] vs paper_forward: mean_abs=1.0128709077835083, max_abs=7.0, mean_rel=0.1720726191997528, max_rel=2059.643310546875, norm_rel=0.024162709712982178, ref_abs_avg=42.16720199584961, test_abs_avg=42.16975402832031
production_forward2 grad[22] vs paper_forward: mean_abs=0.9318479895591736, max_abs=6.0625, mean_rel=0.32669031620025635, max_rel=2749.999755859375, norm_rel=0.02245425432920456, ref_abs_avg=41.691978454589844, test_abs_avg=41.694801330566406
production_forward2 grad[23] vs paper_forward: mean_abs=0.7724065780639648, max_abs=3.0, mean_rel=0.1419452577829361, max_rel=21.495210647583008, norm_rel=0.024503855034708977, ref_abs_avg=32.034175872802734, test_abs_avg=31.972238540649414
production_forward2 grad[24] vs paper_forward: mean_abs=0.9638409614562988, max_abs=6.0, mean_rel=0.1715439260005951, max_rel=1802.1820068359375, norm_rel=0.02389782853424549, ref_abs_avg=40.562591552734375, test_abs_avg=40.56255340576172
production_forward2 grad[25] vs paper_forward: mean_abs=0.887432873249054, max_abs=5.5, mean_rel=0.28312164545059204, max_rel=2687.499755859375, norm_rel=0.022416725754737854, ref_abs_avg=39.797645568847656, test_abs_avg=39.80295944213867
production_forward2 grad[26] vs paper_forward: mean_abs=0.8929345607757568, max_abs=3.25, mean_rel=0.18966332077980042, max_rel=22.55596351623535, norm_rel=0.025233032181859016, ref_abs_avg=34.44279479980469, test_abs_avg=34.467079162597656
production_forward2 grad[27] vs paper_forward: mean_abs=1.129210114479065, max_abs=8.0, mean_rel=0.17035168409347534, max_rel=1810.9459228515625, norm_rel=0.025759926065802574, ref_abs_avg=44.08058547973633, test_abs_avg=44.07893371582031
production_forward2 grad[28] vs paper_forward: mean_abs=1.0474889278411865, max_abs=6.5, mean_rel=0.3780209422111511, max_rel=3093.749755859375, norm_rel=0.024164732545614243, ref_abs_avg=43.533294677734375, test_abs_avg=43.52811050415039
production_forward2 grad[29] vs paper_forward: mean_abs=0.8407115936279297, max_abs=3.0, mean_rel=0.09367293864488602, max_rel=6.822788238525391, norm_rel=0.025142552331089973, ref_abs_avg=33.34807205200195, test_abs_avg=33.39227294921875
production_forward2 grad[30] vs paper_forward: mean_abs=1.0277923345565796, max_abs=8.0, mean_rel=0.18373136222362518, max_rel=2562.83544921875, norm_rel=0.026054512709379196, ref_abs_avg=39.592552185058594, test_abs_avg=39.59266662597656
production_forward2 grad[31] vs paper_forward: mean_abs=0.969662070274353, max_abs=6.5, mean_rel=0.28018781542778015, max_rel=3374.999755859375, norm_rel=0.024794142693281174, ref_abs_avg=39.24968719482422, test_abs_avg=39.25286865234375
production_forward2 grad[32] vs paper_forward: mean_abs=0.7938940525054932, max_abs=3.25, mean_rel=0.1118299663066864, max_rel=9.661916732788086, norm_rel=0.025633351877331734, ref_abs_avg=31.24566650390625, test_abs_avg=31.242666244506836
production_forward2 grad[33] vs paper_forward: mean_abs=0.9655753970146179, max_abs=7.0, mean_rel=0.18903446197509766, max_rel=2986.260009765625, norm_rel=0.025873219594359398, ref_abs_avg=37.48176193237305, test_abs_avg=37.48170852661133
production_forward2 grad[34] vs paper_forward: mean_abs=0.8977531790733337, max_abs=5.75, mean_rel=0.2613261342048645, max_rel=2187.5, norm_rel=0.02449457161128521, ref_abs_avg=36.75504684448242, test_abs_avg=36.7580680847168
production_forward2 grad[35] vs paper_forward: mean_abs=0.7034711837768555, max_abs=3.25, mean_rel=0.08283407986164093, max_rel=6.890483379364014, norm_rel=0.025060681626200676, ref_abs_avg=28.61974334716797, test_abs_avg=28.6212100982666
production_forward2 grad[36] vs paper_forward: mean_abs=0.8981866836547852, max_abs=6.5, mean_rel=0.18802125751972198, max_rel=1720.899169921875, norm_rel=0.025597065687179565, ref_abs_avg=35.24900436401367, test_abs_avg=35.250003814697266
production_forward2 grad[37] vs paper_forward: mean_abs=0.8474780321121216, max_abs=5.5, mean_rel=0.34180524945259094, max_rel=3374.999755859375, norm_rel=0.02433127723634243, ref_abs_avg=34.92790985107422, test_abs_avg=34.92498779296875
production_forward2 grad[38] vs paper_forward: mean_abs=0.6697432994842529, max_abs=2.75, mean_rel=0.22377224266529083, max_rel=40.29315948486328, norm_rel=0.024922985583543777, ref_abs_avg=27.149349212646484, test_abs_avg=27.20926284790039
production_forward2 grad[39] vs paper_forward: mean_abs=0.8479001522064209, max_abs=6.5, mean_rel=0.16462691128253937, max_rel=1118.7626953125, norm_rel=0.02524954453110695, ref_abs_avg=33.651573181152344, test_abs_avg=33.654151916503906
production_forward2 grad[40] vs paper_forward: mean_abs=0.7909789681434631, max_abs=4.75, mean_rel=0.33725565671920776, max_rel=3374.999755859375, norm_rel=0.02382785454392433, ref_abs_avg=33.293155670166016, test_abs_avg=33.29145050048828
production_forward2 grad[41] vs paper_forward: mean_abs=0.6398187875747681, max_abs=2.625, mean_rel=1.0938053131103516, max_rel=461.45849609375, norm_rel=0.02419331483542919, ref_abs_avg=26.457294464111328, test_abs_avg=26.496517181396484
production_forward2 grad[42] vs paper_forward: mean_abs=0.8046362400054932, max_abs=5.5, mean_rel=0.1617538332939148, max_rel=1449.789794921875, norm_rel=0.024998778477311134, ref_abs_avg=32.27116394042969, test_abs_avg=32.2712287902832
production_forward2 grad[43] vs paper_forward: mean_abs=0.7487892508506775, max_abs=5.0, mean_rel=0.2600489854812622, max_rel=2593.749755859375, norm_rel=0.023646797984838486, ref_abs_avg=31.74160385131836, test_abs_avg=31.74241828918457
production_forward2 grad[44] vs paper_forward: mean_abs=0.567042350769043, max_abs=2.5, mean_rel=0.127456396818161, max_rel=13.501322746276855, norm_rel=0.022611180320382118, ref_abs_avg=24.250751495361328, test_abs_avg=24.260784149169922
production_forward2 grad[45] vs paper_forward: mean_abs=0.7711354494094849, max_abs=5.5, mean_rel=0.16175059974193573, max_rel=1013.7413330078125, norm_rel=0.024730606004595757, ref_abs_avg=31.186668395996094, test_abs_avg=31.186058044433594
production_forward2 grad[46] vs paper_forward: mean_abs=0.7140734195709229, max_abs=4.5, mean_rel=0.22339719533920288, max_rel=2125.0, norm_rel=0.023554058745503426, ref_abs_avg=30.42008399963379, test_abs_avg=30.420804977416992
production_forward2 grad[47] vs paper_forward: mean_abs=0.5623400211334229, max_abs=2.125, mean_rel=0.07182934880256653, max_rel=2.9202640056610107, norm_rel=0.022226694971323013, ref_abs_avg=25.40735626220703, test_abs_avg=25.402202606201172
production_forward2 grad[48] vs paper_forward: mean_abs=0.7332381010055542, max_abs=5.0, mean_rel=0.1671917736530304, max_rel=1585.69921875, norm_rel=0.02453002519905567, ref_abs_avg=29.98481559753418, test_abs_avg=29.987380981445312
production_forward2 grad[49] vs paper_forward: mean_abs=0.685427188873291, max_abs=4.75, mean_rel=0.22631901502609253, max_rel=2093.75, norm_rel=0.023365870118141174, ref_abs_avg=29.379653930664062, test_abs_avg=29.38100814819336
production_forward2 grad[50] vs paper_forward: mean_abs=0.6057829856872559, max_abs=2.0, mean_rel=0.11895018815994263, max_rel=11.429348945617676, norm_rel=0.02220003865659237, ref_abs_avg=27.551250457763672, test_abs_avg=27.561866760253906
production_forward2 grad[51] vs paper_forward: mean_abs=0.8073562383651733, max_abs=6.5, mean_rel=0.16706424951553345, max_rel=979.3470458984375, norm_rel=0.025967668741941452, ref_abs_avg=31.20907974243164, test_abs_avg=31.213706970214844
production_forward2 grad[52] vs paper_forward: mean_abs=0.7455476522445679, max_abs=4.78125, mean_rel=0.2865262031555176, max_rel=2421.875, norm_rel=0.024340655654668808, ref_abs_avg=30.71146583557129, test_abs_avg=30.719009399414062
production_forward2 grad[53] vs paper_forward: mean_abs=0.5765328407287598, max_abs=2.02734375, mean_rel=0.185735285282135, max_rel=30.601661682128906, norm_rel=0.023726720362901688, ref_abs_avg=23.92996597290039, test_abs_avg=23.918596267700195
production_forward2 grad[54] vs paper_forward: mean_abs=0.7403456568717957, max_abs=5.25, mean_rel=0.16473214328289032, max_rel=891.7263793945312, norm_rel=0.025901922956109047, ref_abs_avg=28.611303329467773, test_abs_avg=28.609920501708984
production_forward2 grad[55] vs paper_forward: mean_abs=0.6879223585128784, max_abs=4.75, mean_rel=0.2380961775779724, max_rel=2187.5, norm_rel=0.024370865896344185, ref_abs_avg=28.31093978881836, test_abs_avg=28.30864143371582
production_forward2 grad[56] vs paper_forward: mean_abs=0.5463647842407227, max_abs=2.25, mean_rel=0.09862568974494934, max_rel=8.30681324005127, norm_rel=0.024762345477938652, ref_abs_avg=22.265472412109375, test_abs_avg=22.252426147460938
production_forward2 grad[57] vs paper_forward: mean_abs=0.6947318911552429, max_abs=5.5, mean_rel=0.15873713791370392, max_rel=552.4794311523438, norm_rel=0.025223398581147194, ref_abs_avg=27.59450340270996, test_abs_avg=27.593036651611328
production_forward2 grad[58] vs paper_forward: mean_abs=0.6426236629486084, max_abs=4.25, mean_rel=0.2740001678466797, max_rel=1749.9998779296875, norm_rel=0.023948652669787407, ref_abs_avg=26.897722244262695, test_abs_avg=26.893577575683594
production_forward2 grad[59] vs paper_forward: mean_abs=0.5221881866455078, max_abs=2.25, mean_rel=0.1008678674697876, max_rel=14.226046562194824, norm_rel=0.023692941293120384, ref_abs_avg=21.93137550354004, test_abs_avg=21.900455474853516
production_forward2 grad[60] vs paper_forward: mean_abs=0.6537785530090332, max_abs=4.5, mean_rel=0.1599496603012085, max_rel=773.6366577148438, norm_rel=0.02470369264483452, ref_abs_avg=26.508424758911133, test_abs_avg=26.509876251220703
production_forward2 grad[61] vs paper_forward: mean_abs=0.6045618057250977, max_abs=4.0, mean_rel=0.2538991868495941, max_rel=1499.9998779296875, norm_rel=0.023171335458755493, ref_abs_avg=26.021677017211914, test_abs_avg=26.022106170654297
production_forward2 grad[62] vs paper_forward: mean_abs=0.4895009994506836, max_abs=1.9375, mean_rel=0.10683028399944305, max_rel=5.813953399658203, norm_rel=0.02526852674782276, ref_abs_avg=19.44915199279785, test_abs_avg=19.45848846435547
production_forward2 grad[63] vs paper_forward: mean_abs=0.6137994527816772, max_abs=4.5, mean_rel=0.15796786546707153, max_rel=1083.7637939453125, norm_rel=0.024607552215456963, ref_abs_avg=24.977846145629883, test_abs_avg=24.978099822998047
production_forward2 grad[64] vs paper_forward: mean_abs=0.5755832195281982, max_abs=4.0, mean_rel=0.24321326613426208, max_rel=1749.9998779296875, norm_rel=0.023332176730036736, ref_abs_avg=24.666725158691406, test_abs_avg=24.671586990356445
production_forward2 grad[65] vs paper_forward: mean_abs=0.4627561569213867, max_abs=2.0, mean_rel=0.08429348468780518, max_rel=5.662723541259766, norm_rel=0.02323915995657444, ref_abs_avg=20.087509155273438, test_abs_avg=20.081295013427734
production_forward2 grad[66] vs paper_forward: mean_abs=0.5806286931037903, max_abs=6.0, mean_rel=0.15403813123703003, max_rel=1298.5545654296875, norm_rel=0.024073902517557144, ref_abs_avg=24.13117218017578, test_abs_avg=24.13117027282715
production_forward2 grad[67] vs paper_forward: mean_abs=0.5430015325546265, max_abs=3.75, mean_rel=0.23422960937023163, max_rel=1531.2498779296875, norm_rel=0.022398285567760468, ref_abs_avg=24.204345703125, test_abs_avg=24.20662498474121
production_forward2 grad[68] vs paper_forward: mean_abs=0.43159914016723633, max_abs=1.65625, mean_rel=0.05383189022541046, max_rel=1.873304843902588, norm_rel=0.022565366700291634, ref_abs_avg=19.504507064819336, test_abs_avg=19.50715446472168
production_forward2 grad[69] vs paper_forward: mean_abs=0.5583357810974121, max_abs=4.5, mean_rel=0.14365096390247345, max_rel=568.5423583984375, norm_rel=0.023755978792905807, ref_abs_avg=23.531551361083984, test_abs_avg=23.531967163085938
production_forward2 grad[70] vs paper_forward: mean_abs=0.5170809626579285, max_abs=4.0, mean_rel=0.21207426488399506, max_rel=1437.4998779296875, norm_rel=0.021872250363230705, ref_abs_avg=23.54145622253418, test_abs_avg=23.553787231445312
production_forward2 grad[71] vs paper_forward: mean_abs=0.42403697967529297, max_abs=1.6875, mean_rel=0.10706569999456406, max_rel=6.87797737121582, norm_rel=0.022913983091711998, ref_abs_avg=18.484272003173828, test_abs_avg=18.464893341064453
production_forward2 grad[72] vs paper_forward: mean_abs=0.5350810289382935, max_abs=4.5, mean_rel=0.1459212452173233, max_rel=1232.0946044921875, norm_rel=0.02357817254960537, ref_abs_avg=22.723976135253906, test_abs_avg=22.723657608032227
production_forward2 grad[73] vs paper_forward: mean_abs=0.49272239208221436, max_abs=3.5, mean_rel=0.2299308031797409, max_rel=1781.2498779296875, norm_rel=0.02178395725786686, ref_abs_avg=22.546998977661133, test_abs_avg=22.554676055908203
production_forward2 grad[74] vs paper_forward: mean_abs=0.49941158294677734, max_abs=1.75, mean_rel=0.07021559774875641, max_rel=1.6270866394042969, norm_rel=0.02275758981704712, ref_abs_avg=21.934680938720703, test_abs_avg=21.955589294433594
production_forward2 grad[75] vs paper_forward: mean_abs=0.6086413860321045, max_abs=5.0, mean_rel=0.16310474276542664, max_rel=1494.3182373046875, norm_rel=0.024996714666485786, ref_abs_avg=24.363048553466797, test_abs_avg=24.361940383911133
production_forward2 grad[76] vs paper_forward: mean_abs=0.5587075352668762, max_abs=4.75, mean_rel=0.25815635919570923, max_rel=3312.499755859375, norm_rel=0.023527270182967186, ref_abs_avg=23.76412582397461, test_abs_avg=23.756893157958984
production_forward2 grad[77] vs paper_forward: mean_abs=0.44647037982940674, max_abs=1.75, mean_rel=0.22163693606853485, max_rel=53.67573928833008, norm_rel=0.023780187591910362, ref_abs_avg=18.479549407958984, test_abs_avg=18.437030792236328
production_forward2 grad[78] vs paper_forward: mean_abs=0.5528371334075928, max_abs=5.0, mean_rel=0.1531696915626526, max_rel=1195.02490234375, norm_rel=0.024299833923578262, ref_abs_avg=22.761932373046875, test_abs_avg=22.761768341064453
production_forward2 grad[79] vs paper_forward: mean_abs=0.5100531578063965, max_abs=3.625, mean_rel=0.24651344120502472, max_rel=1812.4998779296875, norm_rel=0.02275916561484337, ref_abs_avg=22.436784744262695, test_abs_avg=22.435192108154297
production_forward2 grad[80] vs paper_forward: mean_abs=0.39020657539367676, max_abs=1.75, mean_rel=0.06464549899101257, max_rel=1.938576102256775, norm_rel=0.02059127576649189, ref_abs_avg=18.98015022277832, test_abs_avg=18.985641479492188
production_forward2 grad[81] vs paper_forward: mean_abs=0.5116616487503052, max_abs=4.0, mean_rel=0.15422891080379486, max_rel=1028.8817138671875, norm_rel=0.023757053539156914, ref_abs_avg=21.620914459228516, test_abs_avg=21.622745513916016
production_forward2 grad[82] vs paper_forward: mean_abs=0.4700719118118286, max_abs=3.75, mean_rel=0.20482458174228668, max_rel=1499.9998779296875, norm_rel=0.022108083590865135, ref_abs_avg=21.226125717163086, test_abs_avg=21.228382110595703
production_forward2 grad[83] vs paper_forward: mean_abs=0.3713569641113281, max_abs=1.5, mean_rel=0.21209989488124847, max_rel=36.79059982299805, norm_rel=0.02157055214047432, ref_abs_avg=17.326868057250977, test_abs_avg=17.336910247802734
production_forward2 grad[84] vs paper_forward: mean_abs=0.4792223572731018, max_abs=5.5, mean_rel=0.1485345959663391, max_rel=1443.98779296875, norm_rel=0.02298417128622532, ref_abs_avg=20.899038314819336, test_abs_avg=20.89966583251953
production_forward2 grad[85] vs paper_forward: mean_abs=0.4352005124092102, max_abs=4.0, mean_rel=0.1860455870628357, max_rel=1250.0, norm_rel=0.021643416956067085, ref_abs_avg=20.195497512817383, test_abs_avg=20.192108154296875
production_forward2 grad[86] vs paper_forward: mean_abs=0.3490248918533325, max_abs=1.625, mean_rel=0.1579298973083496, max_rel=38.3165397644043, norm_rel=0.02135067991912365, ref_abs_avg=16.553306579589844, test_abs_avg=16.54825210571289
production_forward2 grad[87] vs paper_forward: mean_abs=0.4506707191467285, max_abs=4.5, mean_rel=0.14029347896575928, max_rel=770.3521728515625, norm_rel=0.02247423492372036, ref_abs_avg=20.133056640625, test_abs_avg=20.133583068847656
production_forward2 grad[88] vs paper_forward: mean_abs=0.4069405794143677, max_abs=3.5, mean_rel=0.2030879259109497, max_rel=1531.2498779296875, norm_rel=0.02056940644979477, ref_abs_avg=19.8834228515625, test_abs_avg=19.883705139160156
production_forward2 grad[89] vs paper_forward: mean_abs=0.3239624500274658, max_abs=1.375, mean_rel=0.15873020887374878, max_rel=26.98370933532715, norm_rel=0.02004147134721279, ref_abs_avg=16.058557510375977, test_abs_avg=16.0526065826416
production_forward2 grad[90] vs paper_forward: mean_abs=0.4228596091270447, max_abs=4.0, mean_rel=0.13390745222568512, max_rel=703.881591796875, norm_rel=0.022017188370227814, ref_abs_avg=19.372718811035156, test_abs_avg=19.372817993164062
production_forward2 grad[91] vs paper_forward: mean_abs=0.38165849447250366, max_abs=3.25, mean_rel=0.1859039068222046, max_rel=1093.75, norm_rel=0.019993938505649567, ref_abs_avg=19.21910858154297, test_abs_avg=19.213428497314453
production_forward2 grad[92] vs paper_forward: mean_abs=0.30870676040649414, max_abs=1.09375, mean_rel=0.08302810788154602, max_rel=7.990971088409424, norm_rel=0.019104497507214546, ref_abs_avg=16.111005783081055, test_abs_avg=16.100814819335938
production_forward2 grad[93] vs paper_forward: mean_abs=0.3887917697429657, max_abs=5.0, mean_rel=0.1277962327003479, max_rel=409.5367126464844, norm_rel=0.02131270244717598, ref_abs_avg=18.448062896728516, test_abs_avg=18.449817657470703
production_forward2 grad[94] vs paper_forward: mean_abs=0.35288065671920776, max_abs=4.015625, mean_rel=0.17366990447044373, max_rel=1312.4998779296875, norm_rel=0.019240304827690125, ref_abs_avg=18.44879722595215, test_abs_avg=18.45562744140625
production_forward2 grad[95] vs paper_forward: mean_abs=0.29233837127685547, max_abs=1.25, mean_rel=0.09463615715503693, max_rel=12.931034088134766, norm_rel=0.01899738237261772, ref_abs_avg=15.597412109375, test_abs_avg=15.589521408081055
production_forward2 grad[96] vs paper_forward: mean_abs=0.38065898418426514, max_abs=4.1875, mean_rel=0.13215301930904388, max_rel=823.8925170898438, norm_rel=0.020917808637022972, ref_abs_avg=18.488773345947266, test_abs_avg=18.48745346069336
production_forward2 grad[97] vs paper_forward: mean_abs=0.33574092388153076, max_abs=5.0, mean_rel=0.17892596125602722, max_rel=1093.75, norm_rel=0.018954487517476082, ref_abs_avg=17.992095947265625, test_abs_avg=17.987842559814453
identity layers + randn queries
production_forward2 fwd+bwd:  191.513 ms
production_forward2 bwd-only: 172.454 ms
production_forward2 peak allocated: fwd=2.864 GiB, fwd+bwd=6.243 GiB
production_forward2 peak reserved:  fwd=3.223 GiB, fwd+bwd=8.973 GiB
production_forward fwd+bwd:  116.602 ms
production_forward bwd-only: 96.013 ms
production_forward peak allocated: fwd=3.368 GiB, fwd+bwd=10.493 GiB
production_forward peak reserved:  fwd=3.598 GiB, fwd+bwd=11.598 GiB
torch_compile_phases_forward fwd+bwd:  165.845 ms
torch_compile_phases_forward bwd-only: 132.492 ms
torch_compile_phases_forward peak allocated: fwd=13.078 GiB, fwd+bwd=13.706 GiB
torch_compile_phases_forward peak reserved:  fwd=13.375 GiB, fwd+bwd=17.627 GiB
paper_forward fwd+bwd:  384.382 ms
paper_forward bwd-only: 304.122 ms
paper_forward peak allocated: fwd=30.003 GiB, fwd+bwd=32.122 GiB
paper_forward peak reserved:  fwd=30.018 GiB, fwd+bwd=32.768 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016166024142876267, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008256696164608002, max_abs=0.421875, mean_rel=0.07193300127983093, max_rel=106.2410659790039, norm_rel=0.019706636667251587, ref_abs_avg=0.45481181144714355, test_abs_avg=0.45482978224754333
production_forward grad[1] vs paper_forward: mean_abs=7.192781925201416, max_abs=62.0, mean_rel=0.14219175279140472, max_rel=155.1742401123047, norm_rel=0.02019362337887287, ref_abs_avg=315.0055236816406, test_abs_avg=314.92938232421875
production_forward grad[2] vs paper_forward: mean_abs=1.1665563583374023, max_abs=4.0625, mean_rel=0.07799720764160156, max_rel=2.6522762775421143, norm_rel=0.02353481948375702, ref_abs_avg=48.83543395996094, test_abs_avg=48.90594482421875
production_forward grad[3] vs paper_forward: mean_abs=1.5964643955230713, max_abs=13.0, mean_rel=0.16543956100940704, max_rel=2431.426025390625, norm_rel=0.024374842643737793, ref_abs_avg=65.91633605957031, test_abs_avg=65.9149169921875
production_forward grad[4] vs paper_forward: mean_abs=1.4695242643356323, max_abs=8.625, mean_rel=0.36146706342697144, max_rel=4500.0, norm_rel=0.022792354226112366, ref_abs_avg=64.87712097167969, test_abs_avg=64.87191772460938
production_forward grad[5] vs paper_forward: mean_abs=1.0991060733795166, max_abs=4.25, mean_rel=0.22358910739421844, max_rel=69.46720123291016, norm_rel=0.02250974252820015, ref_abs_avg=49.07929992675781, test_abs_avg=49.091400146484375
production_forward grad[6] vs paper_forward: mean_abs=1.3859775066375732, max_abs=9.0, mean_rel=0.16721168160438538, max_rel=1688.1370849609375, norm_rel=0.02416125126183033, ref_abs_avg=57.72821807861328, test_abs_avg=57.73290252685547
production_forward grad[7] vs paper_forward: mean_abs=1.2832398414611816, max_abs=8.0, mean_rel=0.40657925605773926, max_rel=3984.374755859375, norm_rel=0.02270531840622425, ref_abs_avg=56.76704406738281, test_abs_avg=56.76881408691406
production_forward grad[8] vs paper_forward: mean_abs=0.9844636917114258, max_abs=4.25, mean_rel=0.08189418166875839, max_rel=8.301872253417969, norm_rel=0.022160690277814865, ref_abs_avg=44.5574951171875, test_abs_avg=44.48906326293945
production_forward grad[9] vs paper_forward: mean_abs=1.2553045749664307, max_abs=9.0, mean_rel=0.16074270009994507, max_rel=1135.68212890625, norm_rel=0.024009771645069122, ref_abs_avg=52.64033508300781, test_abs_avg=52.6454963684082
production_forward grad[10] vs paper_forward: mean_abs=1.1501084566116333, max_abs=6.75, mean_rel=0.35151207447052, max_rel=3406.249755859375, norm_rel=0.02221299521625042, ref_abs_avg=52.12464141845703, test_abs_avg=52.129432678222656
production_forward grad[11] vs paper_forward: mean_abs=0.8993792533874512, max_abs=3.75, mean_rel=0.09740471839904785, max_rel=15.092879295349121, norm_rel=0.021889137104153633, ref_abs_avg=40.86827850341797, test_abs_avg=40.86565399169922
production_forward grad[12] vs paper_forward: mean_abs=1.1569974422454834, max_abs=8.0, mean_rel=0.16529713571071625, max_rel=2281.03759765625, norm_rel=0.02379528433084488, ref_abs_avg=48.98662567138672, test_abs_avg=48.98950958251953
production_forward grad[13] vs paper_forward: mean_abs=1.072218418121338, max_abs=6.75, mean_rel=0.34843504428863525, max_rel=3499.999755859375, norm_rel=0.02224883809685707, ref_abs_avg=48.3834114074707, test_abs_avg=48.38689422607422
production_forward grad[14] vs paper_forward: mean_abs=0.8375163078308105, max_abs=3.5, mean_rel=0.1210373267531395, max_rel=18.71406364440918, norm_rel=0.023204462602734566, ref_abs_avg=36.00436782836914, test_abs_avg=35.99600601196289
production_forward grad[15] vs paper_forward: mean_abs=1.0870468616485596, max_abs=8.0, mean_rel=0.1528625786304474, max_rel=1373.9931640625, norm_rel=0.023694312199950218, ref_abs_avg=46.23767852783203, test_abs_avg=46.24131774902344
production_forward grad[16] vs paper_forward: mean_abs=0.995105504989624, max_abs=6.0, mean_rel=0.2456180304288864, max_rel=2437.5, norm_rel=0.021998202428221703, ref_abs_avg=45.48958969116211, test_abs_avg=45.482810974121094
production_forward grad[17] vs paper_forward: mean_abs=0.8352384567260742, max_abs=4.234375, mean_rel=0.1287127584218979, max_rel=17.487489700317383, norm_rel=0.023180536925792694, ref_abs_avg=35.97944259643555, test_abs_avg=35.95220184326172
production_forward grad[18] vs paper_forward: mean_abs=1.0269155502319336, max_abs=7.0, mean_rel=0.15897361934185028, max_rel=1639.01904296875, norm_rel=0.02354475110769272, ref_abs_avg=43.86655044555664, test_abs_avg=43.86890411376953
production_forward grad[19] vs paper_forward: mean_abs=0.9457100033760071, max_abs=6.0, mean_rel=0.3134651184082031, max_rel=2828.124755859375, norm_rel=0.021955374628305435, ref_abs_avg=43.249839782714844, test_abs_avg=43.25801086425781
production_forward grad[20] vs paper_forward: mean_abs=0.7369897365570068, max_abs=2.84375, mean_rel=0.3779706358909607, max_rel=120.48101806640625, norm_rel=0.02127329632639885, ref_abs_avg=34.75629425048828, test_abs_avg=34.72734832763672
production_forward grad[21] vs paper_forward: mean_abs=0.9783275723457336, max_abs=7.0, mean_rel=0.15910020470619202, max_rel=1216.45947265625, norm_rel=0.023522749543190002, ref_abs_avg=41.84503936767578, test_abs_avg=41.84844970703125
production_forward grad[22] vs paper_forward: mean_abs=0.8945478200912476, max_abs=6.0, mean_rel=0.2771347165107727, max_rel=2125.0, norm_rel=0.021862274035811424, ref_abs_avg=41.09926223754883, test_abs_avg=41.10052490234375
production_forward grad[23] vs paper_forward: mean_abs=0.731276273727417, max_abs=2.515625, mean_rel=0.2440938949584961, max_rel=41.54804992675781, norm_rel=0.022819489240646362, ref_abs_avg=32.29875183105469, test_abs_avg=32.38774108886719
production_forward grad[24] vs paper_forward: mean_abs=0.9217813014984131, max_abs=6.0, mean_rel=0.155829519033432, max_rel=2013.383056640625, norm_rel=0.023361533880233765, ref_abs_avg=39.6866340637207, test_abs_avg=39.69142150878906
production_forward grad[25] vs paper_forward: mean_abs=0.8494199514389038, max_abs=5.0, mean_rel=0.26017168164253235, max_rel=2874.999755859375, norm_rel=0.021712670102715492, ref_abs_avg=39.316078186035156, test_abs_avg=39.31758117675781
production_forward grad[26] vs paper_forward: mean_abs=0.7803951501846313, max_abs=3.125, mean_rel=0.08122047781944275, max_rel=5.344645023345947, norm_rel=0.024360978975892067, ref_abs_avg=32.370418548583984, test_abs_avg=32.329315185546875
production_forward grad[27] vs paper_forward: mean_abs=1.0438523292541504, max_abs=7.0, mean_rel=0.16431443393230438, max_rel=1468.7464599609375, norm_rel=0.025351781398057938, ref_abs_avg=41.405975341796875, test_abs_avg=41.406856536865234
production_forward grad[28] vs paper_forward: mean_abs=0.9708993434906006, max_abs=5.8125, mean_rel=0.35178807377815247, max_rel=3624.999755859375, norm_rel=0.023602496832609177, ref_abs_avg=41.32453155517578, test_abs_avg=41.32477569580078
production_forward grad[29] vs paper_forward: mean_abs=0.7489166259765625, max_abs=2.7734375, mean_rel=0.10075804591178894, max_rel=3.3541035652160645, norm_rel=0.02405184507369995, ref_abs_avg=31.15803337097168, test_abs_avg=31.14388084411621
production_forward grad[30] vs paper_forward: mean_abs=0.9803692102432251, max_abs=6.0, mean_rel=0.1907404363155365, max_rel=1897.8150634765625, norm_rel=0.025710059329867363, ref_abs_avg=38.30419921875, test_abs_avg=38.303653717041016
production_forward grad[31] vs paper_forward: mean_abs=0.9156550765037537, max_abs=5.75, mean_rel=0.27840036153793335, max_rel=2812.499755859375, norm_rel=0.024062758311629295, ref_abs_avg=38.12379455566406, test_abs_avg=38.12568664550781
production_forward grad[32] vs paper_forward: mean_abs=0.7110347747802734, max_abs=2.75, mean_rel=0.08142302930355072, max_rel=2.582043409347534, norm_rel=0.023848433047533035, ref_abs_avg=29.485721588134766, test_abs_avg=29.481962203979492
production_forward grad[33] vs paper_forward: mean_abs=0.9163405895233154, max_abs=6.0, mean_rel=0.17873254418373108, max_rel=1651.0584716796875, norm_rel=0.02571169100701809, ref_abs_avg=35.805809020996094, test_abs_avg=35.808021545410156
production_forward grad[34] vs paper_forward: mean_abs=0.8533685803413391, max_abs=5.3125, mean_rel=0.2860146164894104, max_rel=2687.499755859375, norm_rel=0.02408629283308983, ref_abs_avg=35.57959747314453, test_abs_avg=35.59053039550781
production_forward grad[35] vs paper_forward: mean_abs=0.7121717929840088, max_abs=3.6640625, mean_rel=0.21441766619682312, max_rel=40.00529861450195, norm_rel=0.024221818894147873, ref_abs_avg=29.39316749572754, test_abs_avg=29.395097732543945
production_forward grad[36] vs paper_forward: mean_abs=0.8645312786102295, max_abs=6.0, mean_rel=0.16966135799884796, max_rel=1213.83984375, norm_rel=0.02547810785472393, ref_abs_avg=34.09892272949219, test_abs_avg=34.09778594970703
production_forward grad[37] vs paper_forward: mean_abs=0.8080577254295349, max_abs=6.0, mean_rel=0.2894406318664551, max_rel=2874.999755859375, norm_rel=0.023886144161224365, ref_abs_avg=33.926490783691406, test_abs_avg=33.926536560058594
production_forward grad[38] vs paper_forward: mean_abs=0.6352653503417969, max_abs=3.0, mean_rel=0.10354785621166229, max_rel=8.893084526062012, norm_rel=0.023873131722211838, ref_abs_avg=27.7581787109375, test_abs_avg=27.738876342773438
production_forward grad[39] vs paper_forward: mean_abs=0.8219389915466309, max_abs=5.5, mean_rel=0.1653679758310318, max_rel=1517.6162109375, norm_rel=0.025157004594802856, ref_abs_avg=32.768310546875, test_abs_avg=32.76936340332031
production_forward grad[40] vs paper_forward: mean_abs=0.7641297578811646, max_abs=4.921875, mean_rel=0.25566381216049194, max_rel=1812.4998779296875, norm_rel=0.023670630529522896, ref_abs_avg=32.347808837890625, test_abs_avg=32.33903503417969
production_forward grad[41] vs paper_forward: mean_abs=0.5998560190200806, max_abs=2.0, mean_rel=0.1556321382522583, max_rel=20.08951187133789, norm_rel=0.022690309211611748, ref_abs_avg=25.69757652282715, test_abs_avg=25.69416046142578
production_forward grad[42] vs paper_forward: mean_abs=0.7755101323127747, max_abs=5.25, mean_rel=0.16054770350456238, max_rel=1070.2659912109375, norm_rel=0.02504909411072731, ref_abs_avg=31.066022872924805, test_abs_avg=31.06772804260254
production_forward grad[43] vs paper_forward: mean_abs=0.7209880352020264, max_abs=4.875, mean_rel=0.3008996844291687, max_rel=2312.5, norm_rel=0.02379092387855053, ref_abs_avg=30.383153915405273, test_abs_avg=30.381074905395508
production_forward grad[44] vs paper_forward: mean_abs=0.5783472061157227, max_abs=2.546875, mean_rel=0.09025584906339645, max_rel=5.709482669830322, norm_rel=0.022899018600583076, ref_abs_avg=25.215368270874023, test_abs_avg=25.212108612060547
production_forward grad[45] vs paper_forward: mean_abs=0.7399398684501648, max_abs=5.0, mean_rel=0.16909313201904297, max_rel=1508.6805419921875, norm_rel=0.024813417345285416, ref_abs_avg=29.932769775390625, test_abs_avg=29.936023712158203
production_forward grad[46] vs paper_forward: mean_abs=0.6875601410865784, max_abs=4.5, mean_rel=0.27233201265335083, max_rel=2906.249755859375, norm_rel=0.023182610049843788, ref_abs_avg=29.719924926757812, test_abs_avg=29.726428985595703
production_forward grad[47] vs paper_forward: mean_abs=0.544032096862793, max_abs=2.5, mean_rel=0.14439165592193604, max_rel=11.654302597045898, norm_rel=0.02154683694243431, ref_abs_avg=24.84876251220703, test_abs_avg=24.83678436279297
production_forward grad[48] vs paper_forward: mean_abs=0.705432116985321, max_abs=5.5, mean_rel=0.14856138825416565, max_rel=1051.46875, norm_rel=0.024436114355921745, ref_abs_avg=28.958341598510742, test_abs_avg=28.960399627685547
production_forward grad[49] vs paper_forward: mean_abs=0.6558380722999573, max_abs=4.625, mean_rel=0.2870181202888489, max_rel=2062.5, norm_rel=0.023213539272546768, ref_abs_avg=28.30425262451172, test_abs_avg=28.299880981445312
production_forward grad[50] vs paper_forward: mean_abs=0.6426429748535156, max_abs=2.625, mean_rel=0.1237064003944397, max_rel=14.494729042053223, norm_rel=0.02498280443251133, ref_abs_avg=26.016277313232422, test_abs_avg=25.960330963134766
production_forward grad[51] vs paper_forward: mean_abs=0.793087363243103, max_abs=6.0, mean_rel=0.1599273681640625, max_rel=1260.907958984375, norm_rel=0.025780411437153816, ref_abs_avg=30.89609146118164, test_abs_avg=30.89815902709961
production_forward grad[52] vs paper_forward: mean_abs=0.7418040037155151, max_abs=4.75, mean_rel=0.24056769907474518, max_rel=2062.5, norm_rel=0.024309339001774788, ref_abs_avg=30.63127326965332, test_abs_avg=30.63543701171875
production_forward grad[53] vs paper_forward: mean_abs=0.5616512298583984, max_abs=2.0625, mean_rel=0.09733821451663971, max_rel=6.561161994934082, norm_rel=0.02393559366464615, ref_abs_avg=23.375551223754883, test_abs_avg=23.409236907958984
production_forward grad[54] vs paper_forward: mean_abs=0.7240079045295715, max_abs=5.0, mean_rel=0.16719147562980652, max_rel=1393.148193359375, norm_rel=0.025211405009031296, ref_abs_avg=28.785606384277344, test_abs_avg=28.785972595214844
production_forward grad[55] vs paper_forward: mean_abs=0.6778509020805359, max_abs=5.0, mean_rel=0.2502211332321167, max_rel=2062.5, norm_rel=0.02335326373577118, ref_abs_avg=29.075485229492188, test_abs_avg=29.071861267089844
production_forward grad[56] vs paper_forward: mean_abs=0.505821704864502, max_abs=2.5, mean_rel=0.11963147670030594, max_rel=12.501625061035156, norm_rel=0.02252974733710289, ref_abs_avg=23.170658111572266, test_abs_avg=23.200271606445312
production_forward grad[57] vs paper_forward: mean_abs=0.6819114685058594, max_abs=5.5, mean_rel=0.16509698331356049, max_rel=1664.0595703125, norm_rel=0.024805111810564995, ref_abs_avg=27.518192291259766, test_abs_avg=27.51833724975586
production_forward grad[58] vs paper_forward: mean_abs=0.6334021091461182, max_abs=4.0, mean_rel=0.2532862722873688, max_rel=1765.6248779296875, norm_rel=0.023191142827272415, ref_abs_avg=27.346527099609375, test_abs_avg=27.349761962890625
production_forward grad[59] vs paper_forward: mean_abs=0.49712347984313965, max_abs=1.75, mean_rel=0.13490566611289978, max_rel=24.476682662963867, norm_rel=0.024190744385123253, ref_abs_avg=21.112316131591797, test_abs_avg=21.139019012451172
production_forward grad[60] vs paper_forward: mean_abs=0.6339812278747559, max_abs=5.5, mean_rel=0.16718721389770508, max_rel=1974.884765625, norm_rel=0.02440761588513851, ref_abs_avg=26.03152847290039, test_abs_avg=26.031341552734375
production_forward grad[61] vs paper_forward: mean_abs=0.5904874205589294, max_abs=4.5, mean_rel=0.23622605204582214, max_rel=1374.9998779296875, norm_rel=0.02298152633011341, ref_abs_avg=25.69453239440918, test_abs_avg=25.697067260742188
production_forward grad[62] vs paper_forward: mean_abs=0.4563443660736084, max_abs=1.8125, mean_rel=0.1610005497932434, max_rel=17.102256774902344, norm_rel=0.022878771647810936, ref_abs_avg=20.371484756469727, test_abs_avg=20.33251190185547
production_forward grad[63] vs paper_forward: mean_abs=0.6009849309921265, max_abs=4.5, mean_rel=0.16080068051815033, max_rel=836.6971435546875, norm_rel=0.023992782458662987, ref_abs_avg=25.100330352783203, test_abs_avg=25.102920532226562
production_forward grad[64] vs paper_forward: mean_abs=0.5539113879203796, max_abs=3.5625, mean_rel=0.22771555185317993, max_rel=1749.9998779296875, norm_rel=0.02244427613914013, ref_abs_avg=24.676599502563477, test_abs_avg=24.680416107177734
production_forward grad[65] vs paper_forward: mean_abs=0.4353470802307129, max_abs=1.75, mean_rel=0.08688969165086746, max_rel=2.8509745597839355, norm_rel=0.022476129233837128, ref_abs_avg=19.425987243652344, test_abs_avg=19.432092666625977
production_forward grad[66] vs paper_forward: mean_abs=0.5690815448760986, max_abs=5.0, mean_rel=0.14633773267269135, max_rel=1087.625, norm_rel=0.023694723844528198, ref_abs_avg=24.03264617919922, test_abs_avg=24.03215789794922
production_forward grad[67] vs paper_forward: mean_abs=0.5287469625473022, max_abs=4.25, mean_rel=0.1739090532064438, max_rel=1531.2498779296875, norm_rel=0.02195325680077076, ref_abs_avg=24.08206558227539, test_abs_avg=24.088037490844727
production_forward grad[68] vs paper_forward: mean_abs=0.4390316605567932, max_abs=1.75, mean_rel=0.7006804943084717, max_rel=278.69891357421875, norm_rel=0.022388387471437454, ref_abs_avg=19.581462860107422, test_abs_avg=19.595596313476562
production_forward grad[69] vs paper_forward: mean_abs=0.5475543737411499, max_abs=4.0, mean_rel=0.14295221865177155, max_rel=982.1370849609375, norm_rel=0.023167544975876808, ref_abs_avg=23.619712829589844, test_abs_avg=23.623537063598633
production_forward grad[70] vs paper_forward: mean_abs=0.5042714476585388, max_abs=3.75, mean_rel=0.21156609058380127, max_rel=2125.0, norm_rel=0.021481748670339584, ref_abs_avg=23.435043334960938, test_abs_avg=23.426055908203125
production_forward grad[71] vs paper_forward: mean_abs=0.4010279178619385, max_abs=1.625, mean_rel=0.11966685950756073, max_rel=27.702861785888672, norm_rel=0.021726245060563087, ref_abs_avg=19.1141357421875, test_abs_avg=19.11705780029297
production_forward grad[72] vs paper_forward: mean_abs=0.5231934785842896, max_abs=4.0, mean_rel=0.14662794768810272, max_rel=958.2205200195312, norm_rel=0.0229008961468935, ref_abs_avg=22.87748908996582, test_abs_avg=22.879220962524414
production_forward grad[73] vs paper_forward: mean_abs=0.475240021944046, max_abs=3.5, mean_rel=0.1973138004541397, max_rel=1281.25, norm_rel=0.021468032151460648, ref_abs_avg=22.125816345214844, test_abs_avg=22.129627227783203
production_forward grad[74] vs paper_forward: mean_abs=0.47491979598999023, max_abs=2.001953125, mean_rel=0.10322961211204529, max_rel=5.434054851531982, norm_rel=0.024075547233223915, ref_abs_avg=19.961383819580078, test_abs_avg=19.97041130065918
production_forward grad[75] vs paper_forward: mean_abs=0.5687137842178345, max_abs=4.5, mean_rel=0.15107107162475586, max_rel=862.4340209960938, norm_rel=0.02424968034029007, ref_abs_avg=23.48879623413086, test_abs_avg=23.491487503051758
production_forward grad[76] vs paper_forward: mean_abs=0.5202189087867737, max_abs=3.75, mean_rel=0.22811146080493927, max_rel=1679.6873779296875, norm_rel=0.022599665448069572, ref_abs_avg=23.102611541748047, test_abs_avg=23.11007308959961
production_forward grad[77] vs paper_forward: mean_abs=0.39429372549057007, max_abs=1.5, mean_rel=0.1752656102180481, max_rel=34.30177307128906, norm_rel=0.02130999229848385, ref_abs_avg=17.91692543029785, test_abs_avg=17.967288970947266
production_forward grad[78] vs paper_forward: mean_abs=0.5202750563621521, max_abs=5.5, mean_rel=0.14650657773017883, max_rel=643.9239501953125, norm_rel=0.023931507021188736, ref_abs_avg=21.818984985351562, test_abs_avg=21.82064437866211
production_forward grad[79] vs paper_forward: mean_abs=0.4769580662250519, max_abs=4.25, mean_rel=0.19792678952217102, max_rel=1203.125, norm_rel=0.022251933813095093, ref_abs_avg=21.560178756713867, test_abs_avg=21.56338882446289
production_forward grad[80] vs paper_forward: mean_abs=0.37487852573394775, max_abs=1.875, mean_rel=0.06363232433795929, max_rel=1.9687002897262573, norm_rel=0.02144540287554264, ref_abs_avg=18.024023056030273, test_abs_avg=18.039724349975586
production_forward grad[81] vs paper_forward: mean_abs=0.48845216631889343, max_abs=5.125, mean_rel=0.14269277453422546, max_rel=1016.8063354492188, norm_rel=0.02316160872578621, ref_abs_avg=21.12454605102539, test_abs_avg=21.12432861328125
production_forward grad[82] vs paper_forward: mean_abs=0.4409688711166382, max_abs=3.25, mean_rel=0.20924687385559082, max_rel=1593.7498779296875, norm_rel=0.020983699709177017, ref_abs_avg=20.956138610839844, test_abs_avg=20.951847076416016
production_forward grad[83] vs paper_forward: mean_abs=0.3628873825073242, max_abs=1.53125, mean_rel=0.08137192577123642, max_rel=3.607611656188965, norm_rel=0.02252722904086113, ref_abs_avg=15.941532135009766, test_abs_avg=15.969761848449707
production_forward grad[84] vs paper_forward: mean_abs=0.45734983682632446, max_abs=4.25, mean_rel=0.1413019895553589, max_rel=1217.5589599609375, norm_rel=0.022814033553004265, ref_abs_avg=20.137493133544922, test_abs_avg=20.138004302978516
production_forward grad[85] vs paper_forward: mean_abs=0.4193496108055115, max_abs=4.0, mean_rel=0.1797667145729065, max_rel=1374.9998779296875, norm_rel=0.021836452186107635, ref_abs_avg=19.371784210205078, test_abs_avg=19.374343872070312
production_forward grad[86] vs paper_forward: mean_abs=0.344531774520874, max_abs=1.375, mean_rel=0.09104699641466141, max_rel=7.105783462524414, norm_rel=0.020892977714538574, ref_abs_avg=16.784528732299805, test_abs_avg=16.75758934020996
production_forward grad[87] vs paper_forward: mean_abs=0.43214714527130127, max_abs=3.875, mean_rel=0.1320473849773407, max_rel=701.0122680664062, norm_rel=0.021941540762782097, ref_abs_avg=19.788219451904297, test_abs_avg=19.790069580078125
production_forward grad[88] vs paper_forward: mean_abs=0.3837716579437256, max_abs=3.5, mean_rel=0.17180736362934113, max_rel=1499.9998779296875, norm_rel=0.02010863833129406, ref_abs_avg=19.198863983154297, test_abs_avg=19.207365036010742
production_forward grad[89] vs paper_forward: mean_abs=0.3228492736816406, max_abs=1.203125, mean_rel=0.07168592512607574, max_rel=6.100867748260498, norm_rel=0.019576624035835266, ref_abs_avg=16.632568359375, test_abs_avg=16.636219024658203
production_forward grad[90] vs paper_forward: mean_abs=0.4016498327255249, max_abs=4.25, mean_rel=0.12618035078048706, max_rel=667.9237060546875, norm_rel=0.021567529067397118, ref_abs_avg=18.789508819580078, test_abs_avg=18.78952407836914
production_forward grad[91] vs paper_forward: mean_abs=0.36705946922302246, max_abs=4.0, mean_rel=0.16347754001617432, max_rel=1250.0, norm_rel=0.020207734778523445, ref_abs_avg=18.354978561401367, test_abs_avg=18.35125160217285
production_forward grad[92] vs paper_forward: mean_abs=0.3069443702697754, max_abs=1.09375, mean_rel=0.0718584954738617, max_rel=2.872055768966675, norm_rel=0.02081485651433468, ref_abs_avg=14.427091598510742, test_abs_avg=14.427203178405762
production_forward grad[93] vs paper_forward: mean_abs=0.3852422535419464, max_abs=4.0, mean_rel=0.12677359580993652, max_rel=660.6466674804688, norm_rel=0.02157333493232727, ref_abs_avg=18.046363830566406, test_abs_avg=18.049373626708984
production_forward grad[94] vs paper_forward: mean_abs=0.35731422901153564, max_abs=4.25, mean_rel=0.16605135798454285, max_rel=1531.2498779296875, norm_rel=0.019872648641467094, ref_abs_avg=18.157564163208008, test_abs_avg=18.16346549987793
production_forward grad[95] vs paper_forward: mean_abs=0.280862033367157, max_abs=1.0625, mean_rel=0.08195485919713974, max_rel=5.1442437171936035, norm_rel=0.018514219671487808, ref_abs_avg=15.696924209594727, test_abs_avg=15.697011947631836
production_forward grad[96] vs paper_forward: mean_abs=0.3729966878890991, max_abs=4.0, mean_rel=0.1235806867480278, max_rel=494.5987854003906, norm_rel=0.021205762401223183, ref_abs_avg=17.851943969726562, test_abs_avg=17.851787567138672
production_forward grad[97] vs paper_forward: mean_abs=0.34444737434387207, max_abs=4.0, mean_rel=0.1668352335691452, max_rel=1125.0, norm_rel=0.020398588851094246, ref_abs_avg=17.353809356689453, test_abs_avg=17.356830596923828
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016210910398513079, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008607599884271622, max_abs=0.40625, mean_rel=0.07464693486690521, max_rel=100.83943176269531, norm_rel=0.02042287215590477, ref_abs_avg=0.45481181144714355, test_abs_avg=0.4548136591911316
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.390300750732422, max_abs=60.0, mean_rel=0.14274586737155914, max_rel=216.4485321044922, norm_rel=0.02070724032819271, ref_abs_avg=315.0055236816406, test_abs_avg=314.99658203125
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.210993766784668, max_abs=5.0625, mean_rel=0.09200679510831833, max_rel=3.460822343826294, norm_rel=0.024573910981416702, ref_abs_avg=48.83543395996094, test_abs_avg=48.852622985839844
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6482781171798706, max_abs=12.0, mean_rel=0.16910409927368164, max_rel=2826.327392578125, norm_rel=0.02516421489417553, ref_abs_avg=65.91633605957031, test_abs_avg=65.90870666503906
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5242466926574707, max_abs=10.0, mean_rel=0.3588019013404846, max_rel=3874.999755859375, norm_rel=0.023643460124731064, ref_abs_avg=64.87712097167969, test_abs_avg=64.87400817871094
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1215507984161377, max_abs=4.25, mean_rel=0.21170476078987122, max_rel=63.73301696777344, norm_rel=0.022830737754702568, ref_abs_avg=49.07929992675781, test_abs_avg=49.10149383544922
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4300600290298462, max_abs=9.0, mean_rel=0.17092876136302948, max_rel=1410.592041015625, norm_rel=0.024903716519474983, ref_abs_avg=57.72821807861328, test_abs_avg=57.730224609375
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3278144598007202, max_abs=8.0, mean_rel=0.40389811992645264, max_rel=3984.374755859375, norm_rel=0.023484114557504654, ref_abs_avg=56.76704406738281, test_abs_avg=56.76763153076172
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0433778762817383, max_abs=4.25, mean_rel=0.10060253739356995, max_rel=15.10729694366455, norm_rel=0.023190222680568695, ref_abs_avg=44.5574951171875, test_abs_avg=44.44398880004883
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2936112880706787, max_abs=9.0, mean_rel=0.1620011329650879, max_rel=1395.143310546875, norm_rel=0.024725740775465965, ref_abs_avg=52.64033508300781, test_abs_avg=52.641510009765625
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1920948028564453, max_abs=8.0, mean_rel=0.3681691288948059, max_rel=3249.999755859375, norm_rel=0.02301848493516445, ref_abs_avg=52.12464141845703, test_abs_avg=52.124053955078125
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9321613311767578, max_abs=3.75, mean_rel=0.1004364863038063, max_rel=14.31888484954834, norm_rel=0.022550130262970924, ref_abs_avg=40.86827850341797, test_abs_avg=40.89405822753906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.192162275314331, max_abs=9.0, mean_rel=0.16627752780914307, max_rel=2051.9013671875, norm_rel=0.024505970999598503, ref_abs_avg=48.98662567138672, test_abs_avg=48.986671447753906
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.104658603668213, max_abs=7.0, mean_rel=0.36934706568717957, max_rel=4125.0, norm_rel=0.022935952991247177, ref_abs_avg=48.3834114074707, test_abs_avg=48.3835334777832
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8125705718994141, max_abs=4.0, mean_rel=0.11037468165159225, max_rel=15.169036865234375, norm_rel=0.02333027310669422, ref_abs_avg=36.00436782836914, test_abs_avg=35.979251861572266
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1159462928771973, max_abs=8.5, mean_rel=0.15848560631275177, max_rel=840.6449584960938, norm_rel=0.024303412064909935, ref_abs_avg=46.23767852783203, test_abs_avg=46.24013137817383
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0247581005096436, max_abs=6.0, mean_rel=0.2593875527381897, max_rel=3156.249755859375, norm_rel=0.022655244916677475, ref_abs_avg=45.48958969116211, test_abs_avg=45.481231689453125
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8262083530426025, max_abs=3.078125, mean_rel=0.11471092700958252, max_rel=13.723750114440918, norm_rel=0.02290646731853485, ref_abs_avg=35.97944259643555, test_abs_avg=35.952667236328125
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0550117492675781, max_abs=7.5, mean_rel=0.16474497318267822, max_rel=1461.5972900390625, norm_rel=0.024185122922062874, ref_abs_avg=43.86655044555664, test_abs_avg=43.86650085449219
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.972394585609436, max_abs=5.5625, mean_rel=0.2962645888328552, max_rel=3124.999755859375, norm_rel=0.022573018446564674, ref_abs_avg=43.249839782714844, test_abs_avg=43.256858825683594
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7845638990402222, max_abs=3.625, mean_rel=0.5202661752700806, max_rel=208.89016723632812, norm_rel=0.022624697536230087, ref_abs_avg=34.75629425048828, test_abs_avg=34.7774658203125
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0015151500701904, max_abs=7.0, mean_rel=0.1647181510925293, max_rel=1561.805419921875, norm_rel=0.024066748097538948, ref_abs_avg=41.84503936767578, test_abs_avg=41.846195220947266
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9212807416915894, max_abs=6.0, mean_rel=0.2648922801017761, max_rel=2562.5, norm_rel=0.02249251864850521, ref_abs_avg=41.09926223754883, test_abs_avg=41.10114288330078
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7268993854522705, max_abs=2.671875, mean_rel=0.2647344172000885, max_rel=49.13956832885742, norm_rel=0.022780772298574448, ref_abs_avg=32.29875183105469, test_abs_avg=32.34980010986328
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9424127340316772, max_abs=6.0, mean_rel=0.1532316356897354, max_rel=1391.8433837890625, norm_rel=0.023882366716861725, ref_abs_avg=39.6866340637207, test_abs_avg=39.68870544433594
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8712190389633179, max_abs=5.625, mean_rel=0.28152382373809814, max_rel=2187.5, norm_rel=0.02226988971233368, ref_abs_avg=39.316078186035156, test_abs_avg=39.31724548339844
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8334445953369141, max_abs=3.0, mean_rel=0.09646320343017578, max_rel=6.903193473815918, norm_rel=0.025853555649518967, ref_abs_avg=32.370418548583984, test_abs_avg=32.38591003417969
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0699994564056396, max_abs=7.0, mean_rel=0.16972237825393677, max_rel=1696.2261962890625, norm_rel=0.02599189803004265, ref_abs_avg=41.405975341796875, test_abs_avg=41.40177536010742
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9957393407821655, max_abs=7.0, mean_rel=0.39779132604599, max_rel=3749.999755859375, norm_rel=0.02419567108154297, ref_abs_avg=41.32453155517578, test_abs_avg=41.32227325439453
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7523555755615234, max_abs=2.5078125, mean_rel=0.10186371207237244, max_rel=4.375133991241455, norm_rel=0.024428365752100945, ref_abs_avg=31.15803337097168, test_abs_avg=31.132522583007812
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.000403881072998, max_abs=6.5, mean_rel=0.18946555256843567, max_rel=2103.003662109375, norm_rel=0.02622995153069496, ref_abs_avg=38.30419921875, test_abs_avg=38.30317687988281
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9392551183700562, max_abs=6.25, mean_rel=0.28413495421409607, max_rel=2624.999755859375, norm_rel=0.024701213464140892, ref_abs_avg=38.12379455566406, test_abs_avg=38.123966217041016
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7367877960205078, max_abs=3.125, mean_rel=0.08111832290887833, max_rel=2.883910655975342, norm_rel=0.024785561487078667, ref_abs_avg=29.485721588134766, test_abs_avg=29.415494918823242
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9345044493675232, max_abs=6.75, mean_rel=0.18482649326324463, max_rel=1282.9295654296875, norm_rel=0.026225194334983826, ref_abs_avg=35.805809020996094, test_abs_avg=35.80622100830078
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8730570077896118, max_abs=5.375, mean_rel=0.29489463567733765, max_rel=2968.749755859375, norm_rel=0.024619244039058685, ref_abs_avg=35.57959747314453, test_abs_avg=35.58732223510742
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7335503101348877, max_abs=2.8203125, mean_rel=0.23143720626831055, max_rel=45.65555953979492, norm_rel=0.024628713726997375, ref_abs_avg=29.39316749572754, test_abs_avg=29.365936279296875
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8798607587814331, max_abs=6.0, mean_rel=0.17416130006313324, max_rel=1377.8612060546875, norm_rel=0.02590789459645748, ref_abs_avg=34.09892272949219, test_abs_avg=34.097503662109375
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8238106966018677, max_abs=5.75, mean_rel=0.29947054386138916, max_rel=2874.999755859375, norm_rel=0.02434389851987362, ref_abs_avg=33.926490783691406, test_abs_avg=33.927162170410156
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6406345367431641, max_abs=3.109375, mean_rel=0.10588774085044861, max_rel=8.09817123413086, norm_rel=0.023674586787819862, ref_abs_avg=27.7581787109375, test_abs_avg=27.7673397064209
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8367561101913452, max_abs=6.0, mean_rel=0.16993358731269836, max_rel=1438.80029296875, norm_rel=0.02562088891863823, ref_abs_avg=32.768310546875, test_abs_avg=32.76929473876953
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7824534773826599, max_abs=5.0, mean_rel=0.2748095691204071, max_rel=2156.25, norm_rel=0.024221325293183327, ref_abs_avg=32.347808837890625, test_abs_avg=32.33999252319336
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6043343544006348, max_abs=2.25, mean_rel=0.14613191783428192, max_rel=16.321937561035156, norm_rel=0.02298334799706936, ref_abs_avg=25.69757652282715, test_abs_avg=25.672454833984375
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7874138355255127, max_abs=6.0, mean_rel=0.16208787262439728, max_rel=1190.57763671875, norm_rel=0.025431513786315918, ref_abs_avg=31.066022872924805, test_abs_avg=31.067245483398438
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7345274686813354, max_abs=4.5, mean_rel=0.31083619594573975, max_rel=2312.5, norm_rel=0.024237899109721184, ref_abs_avg=30.383153915405273, test_abs_avg=30.3790340423584
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5990316867828369, max_abs=2.25, mean_rel=0.10262150317430496, max_rel=4.289500713348389, norm_rel=0.023690328001976013, ref_abs_avg=25.215368270874023, test_abs_avg=25.208045959472656
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.750302791595459, max_abs=5.0, mean_rel=0.16596919298171997, max_rel=1748.3619384765625, norm_rel=0.025175033137202263, ref_abs_avg=29.932769775390625, test_abs_avg=29.93505859375
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6983938217163086, max_abs=4.25, mean_rel=0.286077082157135, max_rel=2624.999755859375, norm_rel=0.02356863021850586, ref_abs_avg=29.719924926757812, test_abs_avg=29.726455688476562
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5488309860229492, max_abs=2.5, mean_rel=0.13220009207725525, max_rel=12.349793434143066, norm_rel=0.02204481139779091, ref_abs_avg=24.84876251220703, test_abs_avg=24.845272064208984
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7144209146499634, max_abs=6.0, mean_rel=0.14846567809581757, max_rel=1059.253662109375, norm_rel=0.024756092578172684, ref_abs_avg=28.958341598510742, test_abs_avg=28.96025848388672
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6635478734970093, max_abs=4.5, mean_rel=0.30459243059158325, max_rel=2562.5, norm_rel=0.023461107164621353, ref_abs_avg=28.30425262451172, test_abs_avg=28.300098419189453
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6652975082397461, max_abs=2.875, mean_rel=0.11878793686628342, max_rel=8.072821617126465, norm_rel=0.02554396726191044, ref_abs_avg=26.016277313232422, test_abs_avg=25.959320068359375
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8061825037002563, max_abs=6.125, mean_rel=0.1668543517589569, max_rel=732.423095703125, norm_rel=0.026185059919953346, ref_abs_avg=30.89609146118164, test_abs_avg=30.89746856689453
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7528965473175049, max_abs=4.875, mean_rel=0.26766514778137207, max_rel=2125.0, norm_rel=0.024677276611328125, ref_abs_avg=30.63127326965332, test_abs_avg=30.634353637695312
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5427441596984863, max_abs=2.25, mean_rel=0.10337049514055252, max_rel=6.251407623291016, norm_rel=0.023044582456350327, ref_abs_avg=23.375551223754883, test_abs_avg=23.379436492919922
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7355785369873047, max_abs=6.0, mean_rel=0.16826698184013367, max_rel=1262.1004638671875, norm_rel=0.025599978864192963, ref_abs_avg=28.785606384277344, test_abs_avg=28.78504753112793
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6900801658630371, max_abs=4.5, mean_rel=0.24721337854862213, max_rel=2125.0, norm_rel=0.023749638348817825, ref_abs_avg=29.075485229492188, test_abs_avg=29.07105255126953
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5147891044616699, max_abs=2.5, mean_rel=0.10703176259994507, max_rel=12.43659782409668, norm_rel=0.022840537130832672, ref_abs_avg=23.170658111572266, test_abs_avg=23.200096130371094
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.691430926322937, max_abs=5.0, mean_rel=0.16783851385116577, max_rel=1675.5997314453125, norm_rel=0.025145718827843666, ref_abs_avg=27.518192291259766, test_abs_avg=27.518001556396484
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6404418349266052, max_abs=4.34375, mean_rel=0.26199790835380554, max_rel=1687.4998779296875, norm_rel=0.02342822775244713, ref_abs_avg=27.346527099609375, test_abs_avg=27.350563049316406
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5109374523162842, max_abs=2.0625, mean_rel=0.12436042726039886, max_rel=12.146492958068848, norm_rel=0.025015288963913918, ref_abs_avg=21.112316131591797, test_abs_avg=21.121490478515625
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6429296731948853, max_abs=6.0, mean_rel=0.17305631935596466, max_rel=2055.025634765625, norm_rel=0.024751080200076103, ref_abs_avg=26.03152847290039, test_abs_avg=26.031469345092773
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5999246835708618, max_abs=4.0, mean_rel=0.24408376216888428, max_rel=1499.9998779296875, norm_rel=0.023350266739726067, ref_abs_avg=25.69453239440918, test_abs_avg=25.697643280029297
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4728055000305176, max_abs=2.0, mean_rel=0.16258306801319122, max_rel=10.597739219665527, norm_rel=0.023330802097916603, ref_abs_avg=20.371484756469727, test_abs_avg=20.345062255859375
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6076887249946594, max_abs=5.0, mean_rel=0.16073128581047058, max_rel=1056.4931640625, norm_rel=0.024267645552754402, ref_abs_avg=25.100330352783203, test_abs_avg=25.103496551513672
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5588977932929993, max_abs=3.875, mean_rel=0.24991914629936218, max_rel=1593.7498779296875, norm_rel=0.02266692742705345, ref_abs_avg=24.676599502563477, test_abs_avg=24.678665161132812
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.457611083984375, max_abs=1.75, mean_rel=0.09263843297958374, max_rel=3.0419106483459473, norm_rel=0.02298814058303833, ref_abs_avg=19.425987243652344, test_abs_avg=19.443479537963867
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5750459432601929, max_abs=5.0, mean_rel=0.1481998860836029, max_rel=960.8161010742188, norm_rel=0.023942038416862488, ref_abs_avg=24.03264617919922, test_abs_avg=24.032852172851562
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5356752872467041, max_abs=4.0, mean_rel=0.16885598003864288, max_rel=1125.0, norm_rel=0.022200722247362137, ref_abs_avg=24.08206558227539, test_abs_avg=24.08535385131836
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4335116744041443, max_abs=1.966796875, mean_rel=1.3442128896713257, max_rel=605.7988891601562, norm_rel=0.022052664309740067, ref_abs_avg=19.581462860107422, test_abs_avg=19.601348876953125
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5519492626190186, max_abs=5.0, mean_rel=0.14570872485637665, max_rel=1306.20068359375, norm_rel=0.02334592118859291, ref_abs_avg=23.619712829589844, test_abs_avg=23.622175216674805
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5112526416778564, max_abs=4.0, mean_rel=0.21816332638263702, max_rel=1874.9998779296875, norm_rel=0.021814407780766487, ref_abs_avg=23.435043334960938, test_abs_avg=23.423723220825195
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.41170573234558105, max_abs=1.75, mean_rel=0.12017243355512619, max_rel=18.722402572631836, norm_rel=0.02207949385046959, ref_abs_avg=19.1141357421875, test_abs_avg=19.115612030029297
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.526875913143158, max_abs=4.0, mean_rel=0.14634840190410614, max_rel=897.5081176757812, norm_rel=0.023048661649227142, ref_abs_avg=22.87748908996582, test_abs_avg=22.878713607788086
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.47722339630126953, max_abs=3.875, mean_rel=0.1985682249069214, max_rel=1437.4998779296875, norm_rel=0.021556274965405464, ref_abs_avg=22.125816345214844, test_abs_avg=22.12796401977539
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.47324514389038086, max_abs=2.142578125, mean_rel=0.12808376550674438, max_rel=10.26096248626709, norm_rel=0.02383594587445259, ref_abs_avg=19.961383819580078, test_abs_avg=19.977237701416016
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5743428468704224, max_abs=6.0, mean_rel=0.15510684251785278, max_rel=1225.4925537109375, norm_rel=0.024497652426362038, ref_abs_avg=23.48879623413086, test_abs_avg=23.49028205871582
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5271053314208984, max_abs=3.59375, mean_rel=0.22429867088794708, max_rel=1921.8748779296875, norm_rel=0.02289220318198204, ref_abs_avg=23.102611541748047, test_abs_avg=23.11123275756836
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.38992029428482056, max_abs=1.625, mean_rel=0.18307200074195862, max_rel=38.108123779296875, norm_rel=0.021208396181464195, ref_abs_avg=17.91692543029785, test_abs_avg=17.933971405029297
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5251368880271912, max_abs=4.5078125, mean_rel=0.1495734006166458, max_rel=771.6089477539062, norm_rel=0.024139530956745148, ref_abs_avg=21.818984985351562, test_abs_avg=21.82012939453125
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.48566555976867676, max_abs=4.0625, mean_rel=0.21947556734085083, max_rel=1499.9998779296875, norm_rel=0.022625362500548363, ref_abs_avg=21.560178756713867, test_abs_avg=21.559206008911133
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3859065771102905, max_abs=1.5, mean_rel=0.06795454025268555, max_rel=4.094093322753906, norm_rel=0.022012144327163696, ref_abs_avg=18.024023056030273, test_abs_avg=18.040485382080078
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.49261996150016785, max_abs=5.21875, mean_rel=0.14260253310203552, max_rel=990.56201171875, norm_rel=0.0233476385474205, ref_abs_avg=21.12454605102539, test_abs_avg=21.123065948486328
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4450621008872986, max_abs=3.25, mean_rel=0.20478160679340363, max_rel=1499.9998779296875, norm_rel=0.021179256960749626, ref_abs_avg=20.956138610839844, test_abs_avg=20.955223083496094
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3654947280883789, max_abs=1.25, mean_rel=0.09945065528154373, max_rel=7.623475074768066, norm_rel=0.02272794023156166, ref_abs_avg=15.941532135009766, test_abs_avg=15.971758842468262
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4610140323638916, max_abs=4.5, mean_rel=0.14135080575942993, max_rel=1013.6649169921875, norm_rel=0.02298732101917267, ref_abs_avg=20.137493133544922, test_abs_avg=20.137069702148438
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4225410223007202, max_abs=4.0, mean_rel=0.18359141051769257, max_rel=1203.125, norm_rel=0.022033268585801125, ref_abs_avg=19.371784210205078, test_abs_avg=19.372346878051758
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.353914737701416, max_abs=1.4931640625, mean_rel=0.08590497076511383, max_rel=8.588729858398438, norm_rel=0.02121593989431858, ref_abs_avg=16.784528732299805, test_abs_avg=16.776016235351562
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4349421262741089, max_abs=5.0, mean_rel=0.13197381794452667, max_rel=676.1665649414062, norm_rel=0.022081861272454262, ref_abs_avg=19.788219451904297, test_abs_avg=19.789806365966797
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.38583138585090637, max_abs=3.25, mean_rel=0.1764216125011444, max_rel=2062.5, norm_rel=0.020151667296886444, ref_abs_avg=19.198863983154297, test_abs_avg=19.21017074584961
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.324615478515625, max_abs=1.25, mean_rel=0.06822346150875092, max_rel=4.519161224365234, norm_rel=0.01962248794734478, ref_abs_avg=16.632568359375, test_abs_avg=16.62464141845703
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.40372318029403687, max_abs=5.0, mean_rel=0.1275830864906311, max_rel=429.9021301269531, norm_rel=0.021677497774362564, ref_abs_avg=18.789508819580078, test_abs_avg=18.78915023803711
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.37346625328063965, max_abs=3.5, mean_rel=0.16870367527008057, max_rel=1406.2498779296875, norm_rel=0.020632121711969376, ref_abs_avg=18.354978561401367, test_abs_avg=18.351806640625
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30468034744262695, max_abs=1.125, mean_rel=0.08819637447595596, max_rel=7.212998390197754, norm_rel=0.02083716168999672, ref_abs_avg=14.427091598510742, test_abs_avg=14.423005104064941
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.38692277669906616, max_abs=4.25, mean_rel=0.12711180746555328, max_rel=523.1039428710938, norm_rel=0.021665604785084724, ref_abs_avg=18.046363830566406, test_abs_avg=18.048572540283203
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3580699861049652, max_abs=3.625, mean_rel=0.16962170600891113, max_rel=1343.7498779296875, norm_rel=0.01993636041879654, ref_abs_avg=18.157564163208008, test_abs_avg=18.163677215576172
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2835841178894043, max_abs=1.0625, mean_rel=0.09651613235473633, max_rel=8.168379783630371, norm_rel=0.018395448103547096, ref_abs_avg=15.696924209594727, test_abs_avg=15.699402809143066
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3735237121582031, max_abs=4.5, mean_rel=0.12612856924533844, max_rel=683.6154174804688, norm_rel=0.02124783955514431, ref_abs_avg=17.851943969726562, test_abs_avg=17.851402282714844
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.34888750314712524, max_abs=3.25, mean_rel=0.17240193486213684, max_rel=1093.75, norm_rel=0.020608648657798767, ref_abs_avg=17.353809356689453, test_abs_avg=17.35584259033203
production_forward2 vs paper_forward output: mean_abs=0.0016166024142876267, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.00858888030052185, max_abs=0.421875, mean_rel=0.0745067447423935, max_rel=91.1047134399414, norm_rel=0.020371120423078537, ref_abs_avg=0.45481181144714355, test_abs_avg=0.4548187553882599
production_forward2 grad[1] vs paper_forward: mean_abs=7.366345405578613, max_abs=62.0, mean_rel=0.14197607338428497, max_rel=184.12802124023438, norm_rel=0.020689181983470917, ref_abs_avg=315.0055236816406, test_abs_avg=314.95953369140625
production_forward2 grad[2] vs paper_forward: mean_abs=1.235213279724121, max_abs=4.8125, mean_rel=0.09503096342086792, max_rel=5.237173557281494, norm_rel=0.024593215435743332, ref_abs_avg=48.83543395996094, test_abs_avg=48.88179016113281
production_forward2 grad[3] vs paper_forward: mean_abs=1.644540786743164, max_abs=11.0, mean_rel=0.16813057661056519, max_rel=1756.420654296875, norm_rel=0.025105852633714676, ref_abs_avg=65.91633605957031, test_abs_avg=65.90884399414062
production_forward2 grad[4] vs paper_forward: mean_abs=1.524035930633545, max_abs=9.125, mean_rel=0.3444299101829529, max_rel=4375.0, norm_rel=0.023609541356563568, ref_abs_avg=64.87712097167969, test_abs_avg=64.86576843261719
production_forward2 grad[5] vs paper_forward: mean_abs=1.099945306777954, max_abs=4.0, mean_rel=0.15882670879364014, max_rel=34.7890625, norm_rel=0.022522008046507835, ref_abs_avg=49.07929992675781, test_abs_avg=49.130062103271484
production_forward2 grad[6] vs paper_forward: mean_abs=1.4272546768188477, max_abs=10.0, mean_rel=0.1721118688583374, max_rel=1001.8971557617188, norm_rel=0.02486378513276577, ref_abs_avg=57.72821807861328, test_abs_avg=57.731632232666016
production_forward2 grad[7] vs paper_forward: mean_abs=1.3208110332489014, max_abs=7.5, mean_rel=0.4037299156188965, max_rel=4750.0, norm_rel=0.0233877282589674, ref_abs_avg=56.76704406738281, test_abs_avg=56.76634216308594
production_forward2 grad[8] vs paper_forward: mean_abs=1.013308048248291, max_abs=4.0, mean_rel=0.06710744649171829, max_rel=1.8352981805801392, norm_rel=0.022951027378439903, ref_abs_avg=44.5574951171875, test_abs_avg=44.455284118652344
production_forward2 grad[9] vs paper_forward: mean_abs=1.2907582521438599, max_abs=9.0, mean_rel=0.16384729743003845, max_rel=1183.334716796875, norm_rel=0.024668021127581596, ref_abs_avg=52.64033508300781, test_abs_avg=52.64122009277344
production_forward2 grad[10] vs paper_forward: mean_abs=1.1892528533935547, max_abs=7.0, mean_rel=0.36128419637680054, max_rel=3281.249755859375, norm_rel=0.022960009053349495, ref_abs_avg=52.12464141845703, test_abs_avg=52.12683868408203
production_forward2 grad[11] vs paper_forward: mean_abs=0.9505105018615723, max_abs=3.5625, mean_rel=0.07310181856155396, max_rel=1.804094910621643, norm_rel=0.023011328652501106, ref_abs_avg=40.86827850341797, test_abs_avg=40.86035919189453
production_forward2 grad[12] vs paper_forward: mean_abs=1.1869535446166992, max_abs=10.0, mean_rel=0.17600524425506592, max_rel=2689.498046875, norm_rel=0.024406621232628822, ref_abs_avg=48.98662567138672, test_abs_avg=48.989013671875
production_forward2 grad[13] vs paper_forward: mean_abs=1.1022322177886963, max_abs=7.0, mean_rel=0.35547518730163574, max_rel=4125.0, norm_rel=0.02287021279335022, ref_abs_avg=48.3834114074707, test_abs_avg=48.38497543334961
production_forward2 grad[14] vs paper_forward: mean_abs=0.85845947265625, max_abs=3.5, mean_rel=0.12317485362291336, max_rel=14.909645080566406, norm_rel=0.023911045864224434, ref_abs_avg=36.00436782836914, test_abs_avg=35.98460388183594
production_forward2 grad[15] vs paper_forward: mean_abs=1.113168716430664, max_abs=8.0, mean_rel=0.15380136668682098, max_rel=1044.677490234375, norm_rel=0.02424335666000843, ref_abs_avg=46.23767852783203, test_abs_avg=46.24072265625
production_forward2 grad[16] vs paper_forward: mean_abs=1.0236952304840088, max_abs=7.0, mean_rel=0.24787043035030365, max_rel=2749.999755859375, norm_rel=0.022633645683526993, ref_abs_avg=45.48958969116211, test_abs_avg=45.479515075683594
production_forward2 grad[17] vs paper_forward: mean_abs=0.8199481964111328, max_abs=3.75, mean_rel=0.11555886268615723, max_rel=13.80928897857666, norm_rel=0.023110458627343178, ref_abs_avg=35.97944259643555, test_abs_avg=35.94978332519531
production_forward2 grad[18] vs paper_forward: mean_abs=1.0515820980072021, max_abs=7.0, mean_rel=0.16576169431209564, max_rel=1874.3563232421875, norm_rel=0.024111788719892502, ref_abs_avg=43.86655044555664, test_abs_avg=43.86805725097656
production_forward2 grad[19] vs paper_forward: mean_abs=0.9680024981498718, max_abs=6.0, mean_rel=0.32421088218688965, max_rel=3187.499755859375, norm_rel=0.02247505635023117, ref_abs_avg=43.249839782714844, test_abs_avg=43.2571907043457
production_forward2 grad[20] vs paper_forward: mean_abs=0.7691906690597534, max_abs=2.703125, mean_rel=0.31515398621559143, max_rel=97.17938232421875, norm_rel=0.021991997957229614, ref_abs_avg=34.75629425048828, test_abs_avg=34.74427032470703
production_forward2 grad[21] vs paper_forward: mean_abs=0.9991573095321655, max_abs=6.75, mean_rel=0.1643410474061966, max_rel=1463.1351318359375, norm_rel=0.024005478248000145, ref_abs_avg=41.84503936767578, test_abs_avg=41.847801208496094
production_forward2 grad[22] vs paper_forward: mean_abs=0.9176573753356934, max_abs=6.0, mean_rel=0.2818334698677063, max_rel=2062.5, norm_rel=0.022412791848182678, ref_abs_avg=41.09926223754883, test_abs_avg=41.102142333984375
production_forward2 grad[23] vs paper_forward: mean_abs=0.7251746654510498, max_abs=3.046875, mean_rel=0.2319643199443817, max_rel=43.2999382019043, norm_rel=0.02286028116941452, ref_abs_avg=32.29875183105469, test_abs_avg=32.38917541503906
production_forward2 grad[24] vs paper_forward: mean_abs=0.9403942823410034, max_abs=7.0, mean_rel=0.1577904373407364, max_rel=1797.1953125, norm_rel=0.023821435868740082, ref_abs_avg=39.6866340637207, test_abs_avg=39.689208984375
production_forward2 grad[25] vs paper_forward: mean_abs=0.8691142797470093, max_abs=5.0, mean_rel=0.2723446488380432, max_rel=2375.0, norm_rel=0.022196322679519653, ref_abs_avg=39.316078186035156, test_abs_avg=39.31719207763672
production_forward2 grad[26] vs paper_forward: mean_abs=0.8333252668380737, max_abs=2.875, mean_rel=0.09013836085796356, max_rel=7.785390853881836, norm_rel=0.02601461485028267, ref_abs_avg=32.370418548583984, test_abs_avg=32.357757568359375
production_forward2 grad[27] vs paper_forward: mean_abs=1.0675239562988281, max_abs=7.0, mean_rel=0.16955898702144623, max_rel=1971.0975341796875, norm_rel=0.02590775117278099, ref_abs_avg=41.405975341796875, test_abs_avg=41.40333557128906
production_forward2 grad[28] vs paper_forward: mean_abs=0.993267297744751, max_abs=6.0, mean_rel=0.3862467408180237, max_rel=4750.0, norm_rel=0.02414357103407383, ref_abs_avg=41.32453155517578, test_abs_avg=41.326087951660156
production_forward2 grad[29] vs paper_forward: mean_abs=0.7609996795654297, max_abs=2.609375, mean_rel=0.10114923119544983, max_rel=4.375133991241455, norm_rel=0.024076014757156372, ref_abs_avg=31.15803337097168, test_abs_avg=31.144763946533203
production_forward2 grad[30] vs paper_forward: mean_abs=0.9993318915367126, max_abs=7.0, mean_rel=0.18779343366622925, max_rel=2487.732177734375, norm_rel=0.026214245706796646, ref_abs_avg=38.30419921875, test_abs_avg=38.302162170410156
production_forward2 grad[31] vs paper_forward: mean_abs=0.9357764720916748, max_abs=6.0, mean_rel=0.2764127254486084, max_rel=2812.499755859375, norm_rel=0.024584582075476646, ref_abs_avg=38.12379455566406, test_abs_avg=38.124332427978516
production_forward2 grad[32] vs paper_forward: mean_abs=0.7442214488983154, max_abs=3.1875, mean_rel=0.08083730936050415, max_rel=2.34262752532959, norm_rel=0.024991080164909363, ref_abs_avg=29.485721588134766, test_abs_avg=29.476016998291016
production_forward2 grad[33] vs paper_forward: mean_abs=0.9330418109893799, max_abs=6.5, mean_rel=0.18692785501480103, max_rel=1888.167236328125, norm_rel=0.026161426678299904, ref_abs_avg=35.805809020996094, test_abs_avg=35.806068420410156
production_forward2 grad[34] vs paper_forward: mean_abs=0.8708531856536865, max_abs=6.0, mean_rel=0.28507891297340393, max_rel=2390.625, norm_rel=0.024558451026678085, ref_abs_avg=35.57959747314453, test_abs_avg=35.589759826660156
production_forward2 grad[35] vs paper_forward: mean_abs=0.705435037612915, max_abs=3.0, mean_rel=0.17944103479385376, max_rel=23.95341682434082, norm_rel=0.023918744176626205, ref_abs_avg=29.39316749572754, test_abs_avg=29.37979507446289
production_forward2 grad[36] vs paper_forward: mean_abs=0.8777918815612793, max_abs=6.0, mean_rel=0.17381355166435242, max_rel=1228.67431640625, norm_rel=0.02585873007774353, ref_abs_avg=34.09892272949219, test_abs_avg=34.0976448059082
production_forward2 grad[37] vs paper_forward: mean_abs=0.8217129707336426, max_abs=6.0, mean_rel=0.30515995621681213, max_rel=2874.999755859375, norm_rel=0.02427830919623375, ref_abs_avg=33.926490783691406, test_abs_avg=33.92658233642578
production_forward2 grad[38] vs paper_forward: mean_abs=0.6354484558105469, max_abs=3.109375, mean_rel=0.11372482776641846, max_rel=10.979729652404785, norm_rel=0.024120647460222244, ref_abs_avg=27.7581787109375, test_abs_avg=27.745868682861328
production_forward2 grad[39] vs paper_forward: mean_abs=0.8355610966682434, max_abs=5.625, mean_rel=0.1684924066066742, max_rel=1332.8243408203125, norm_rel=0.025561727583408356, ref_abs_avg=32.768310546875, test_abs_avg=32.769447326660156
production_forward2 grad[40] vs paper_forward: mean_abs=0.7779139280319214, max_abs=5.0, mean_rel=0.27231189608573914, max_rel=1968.7498779296875, norm_rel=0.0240827277302742, ref_abs_avg=32.347808837890625, test_abs_avg=32.336997985839844
production_forward2 grad[41] vs paper_forward: mean_abs=0.6144192218780518, max_abs=2.5, mean_rel=0.13420891761779785, max_rel=13.345170021057129, norm_rel=0.023278163745999336, ref_abs_avg=25.69757652282715, test_abs_avg=25.7004337310791
production_forward2 grad[42] vs paper_forward: mean_abs=0.7858576774597168, max_abs=5.5, mean_rel=0.16227951645851135, max_rel=1137.6773681640625, norm_rel=0.025374753400683403, ref_abs_avg=31.066022872924805, test_abs_avg=31.067718505859375
production_forward2 grad[43] vs paper_forward: mean_abs=0.7324081659317017, max_abs=4.75, mean_rel=0.3213525712490082, max_rel=2437.5, norm_rel=0.024161964654922485, ref_abs_avg=30.383153915405273, test_abs_avg=30.380495071411133
production_forward2 grad[44] vs paper_forward: mean_abs=0.586672306060791, max_abs=2.28125, mean_rel=0.09875944256782532, max_rel=6.907411575317383, norm_rel=0.023108800873160362, ref_abs_avg=25.215368270874023, test_abs_avg=25.21261215209961
production_forward2 grad[45] vs paper_forward: mean_abs=0.7483785152435303, max_abs=5.0, mean_rel=0.17202657461166382, max_rel=1117.3221435546875, norm_rel=0.025100121274590492, ref_abs_avg=29.932769775390625, test_abs_avg=29.935352325439453
production_forward2 grad[46] vs paper_forward: mean_abs=0.6954481601715088, max_abs=4.28125, mean_rel=0.28045737743377686, max_rel=3124.999755859375, norm_rel=0.023453354835510254, ref_abs_avg=29.719924926757812, test_abs_avg=29.725513458251953
production_forward2 grad[47] vs paper_forward: mean_abs=0.5595798492431641, max_abs=2.5, mean_rel=0.14668568968772888, max_rel=13.24456787109375, norm_rel=0.02212882786989212, ref_abs_avg=24.84876251220703, test_abs_avg=24.829023361206055
production_forward2 grad[48] vs paper_forward: mean_abs=0.712935209274292, max_abs=6.0, mean_rel=0.14958535134792328, max_rel=911.3397827148438, norm_rel=0.02469669096171856, ref_abs_avg=28.958341598510742, test_abs_avg=28.960107803344727
production_forward2 grad[49] vs paper_forward: mean_abs=0.6629766225814819, max_abs=4.5, mean_rel=0.2969023287296295, max_rel=2062.5, norm_rel=0.023456957191228867, ref_abs_avg=28.30425262451172, test_abs_avg=28.298442840576172
production_forward2 grad[50] vs paper_forward: mean_abs=0.6468210220336914, max_abs=3.125, mean_rel=0.11271354556083679, max_rel=10.980854988098145, norm_rel=0.025038009509444237, ref_abs_avg=26.016277313232422, test_abs_avg=25.949756622314453
production_forward2 grad[51] vs paper_forward: mean_abs=0.8049789071083069, max_abs=7.0, mean_rel=0.16453179717063904, max_rel=1509.232177734375, norm_rel=0.02615795098245144, ref_abs_avg=30.89609146118164, test_abs_avg=30.896949768066406
production_forward2 grad[52] vs paper_forward: mean_abs=0.7525018453598022, max_abs=4.75, mean_rel=0.24930578470230103, max_rel=2937.499755859375, norm_rel=0.024645235389471054, ref_abs_avg=30.63127326965332, test_abs_avg=30.63574981689453
production_forward2 grad[53] vs paper_forward: mean_abs=0.5519547462463379, max_abs=2.0625, mean_rel=0.10519953817129135, max_rel=8.05361557006836, norm_rel=0.023505518212914467, ref_abs_avg=23.375551223754883, test_abs_avg=23.402118682861328
production_forward2 grad[54] vs paper_forward: mean_abs=0.7339605093002319, max_abs=4.875, mean_rel=0.17062899470329285, max_rel=1207.44921875, norm_rel=0.025547880679368973, ref_abs_avg=28.785606384277344, test_abs_avg=28.785476684570312
production_forward2 grad[55] vs paper_forward: mean_abs=0.6870526075363159, max_abs=4.5, mean_rel=0.24104778468608856, max_rel=1999.9998779296875, norm_rel=0.023664439097046852, ref_abs_avg=29.075485229492188, test_abs_avg=29.071176528930664
production_forward2 grad[56] vs paper_forward: mean_abs=0.5230875015258789, max_abs=2.75, mean_rel=0.12039446830749512, max_rel=13.346988677978516, norm_rel=0.023012999445199966, ref_abs_avg=23.170658111572266, test_abs_avg=23.19528579711914
production_forward2 grad[57] vs paper_forward: mean_abs=0.6900129318237305, max_abs=5.03125, mean_rel=0.1658398061990738, max_rel=1310.4708251953125, norm_rel=0.0250871479511261, ref_abs_avg=27.518192291259766, test_abs_avg=27.517602920532227
production_forward2 grad[58] vs paper_forward: mean_abs=0.6409039497375488, max_abs=4.0, mean_rel=0.25879591703414917, max_rel=1687.4998779296875, norm_rel=0.02346351370215416, ref_abs_avg=27.346527099609375, test_abs_avg=27.348405838012695
production_forward2 grad[59] vs paper_forward: mean_abs=0.488258957862854, max_abs=1.859375, mean_rel=0.1376262903213501, max_rel=19.458581924438477, norm_rel=0.024045255035161972, ref_abs_avg=21.112316131591797, test_abs_avg=21.14014434814453
production_forward2 grad[60] vs paper_forward: mean_abs=0.641399621963501, max_abs=6.0, mean_rel=0.17097097635269165, max_rel=1796.58544921875, norm_rel=0.024696191772818565, ref_abs_avg=26.03152847290039, test_abs_avg=26.03101921081543
production_forward2 grad[61] vs paper_forward: mean_abs=0.5974017977714539, max_abs=4.5, mean_rel=0.24320749938488007, max_rel=1562.4998779296875, norm_rel=0.023242007941007614, ref_abs_avg=25.69453239440918, test_abs_avg=25.696880340576172
production_forward2 grad[62] vs paper_forward: mean_abs=0.46173930168151855, max_abs=2.1875, mean_rel=0.15498140454292297, max_rel=13.718064308166504, norm_rel=0.023188848048448563, ref_abs_avg=20.371484756469727, test_abs_avg=20.33243179321289
production_forward2 grad[63] vs paper_forward: mean_abs=0.6065448522567749, max_abs=4.5, mean_rel=0.16022861003875732, max_rel=1036.8099365234375, norm_rel=0.02422378398478031, ref_abs_avg=25.100330352783203, test_abs_avg=25.102184295654297
production_forward2 grad[64] vs paper_forward: mean_abs=0.5607200860977173, max_abs=3.703125, mean_rel=0.23711752891540527, max_rel=1656.2498779296875, norm_rel=0.022726327180862427, ref_abs_avg=24.676599502563477, test_abs_avg=24.680522918701172
production_forward2 grad[65] vs paper_forward: mean_abs=0.4413111209869385, max_abs=1.625, mean_rel=0.09268584847450256, max_rel=3.771393299102783, norm_rel=0.022484896704554558, ref_abs_avg=19.425987243652344, test_abs_avg=19.450502395629883
production_forward2 grad[66] vs paper_forward: mean_abs=0.5739854574203491, max_abs=4.8203125, mean_rel=0.14738894999027252, max_rel=913.9017944335938, norm_rel=0.02389787696301937, ref_abs_avg=24.03264617919922, test_abs_avg=24.031997680664062
production_forward2 grad[67] vs paper_forward: mean_abs=0.5347741842269897, max_abs=4.25, mean_rel=0.17569640278816223, max_rel=1593.7498779296875, norm_rel=0.022196078673005104, ref_abs_avg=24.08206558227539, test_abs_avg=24.08823013305664
production_forward2 grad[68] vs paper_forward: mean_abs=0.4362393021583557, max_abs=1.75, mean_rel=0.766812264919281, max_rel=317.1812438964844, norm_rel=0.022228632122278214, ref_abs_avg=19.581462860107422, test_abs_avg=19.589927673339844
production_forward2 grad[69] vs paper_forward: mean_abs=0.5512682199478149, max_abs=4.5, mean_rel=0.14650581777095795, max_rel=1187.8917236328125, norm_rel=0.023310856893658638, ref_abs_avg=23.619712829589844, test_abs_avg=23.622591018676758
production_forward2 grad[70] vs paper_forward: mean_abs=0.5082999467849731, max_abs=3.71875, mean_rel=0.21265064179897308, max_rel=1968.7498779296875, norm_rel=0.021651053801178932, ref_abs_avg=23.435043334960938, test_abs_avg=23.426204681396484
production_forward2 grad[71] vs paper_forward: mean_abs=0.40085673332214355, max_abs=1.5, mean_rel=0.12475906312465668, max_rel=25.498050689697266, norm_rel=0.021449102088809013, ref_abs_avg=19.1141357421875, test_abs_avg=19.114261627197266
production_forward2 grad[72] vs paper_forward: mean_abs=0.5265613198280334, max_abs=4.0, mean_rel=0.14715522527694702, max_rel=863.2821044921875, norm_rel=0.023034553974866867, ref_abs_avg=22.87748908996582, test_abs_avg=22.8787841796875
production_forward2 grad[73] vs paper_forward: mean_abs=0.4787828326225281, max_abs=3.4375, mean_rel=0.19547510147094727, max_rel=1374.9998779296875, norm_rel=0.021627899259328842, ref_abs_avg=22.125816345214844, test_abs_avg=22.128742218017578
production_forward2 grad[74] vs paper_forward: mean_abs=0.47783565521240234, max_abs=2.0, mean_rel=0.1117452010512352, max_rel=5.6179070472717285, norm_rel=0.02421804890036583, ref_abs_avg=19.961383819580078, test_abs_avg=19.964401245117188
production_forward2 grad[75] vs paper_forward: mean_abs=0.5744072198867798, max_abs=4.75, mean_rel=0.15425574779510498, max_rel=836.5012817382812, norm_rel=0.02448599599301815, ref_abs_avg=23.48879623413086, test_abs_avg=23.491500854492188
production_forward2 grad[76] vs paper_forward: mean_abs=0.5261455178260803, max_abs=4.0, mean_rel=0.2283288836479187, max_rel=1593.7498779296875, norm_rel=0.022847631946206093, ref_abs_avg=23.102611541748047, test_abs_avg=23.111000061035156
production_forward2 grad[77] vs paper_forward: mean_abs=0.3913179039955139, max_abs=1.5, mean_rel=0.20210157334804535, max_rel=50.320167541503906, norm_rel=0.021129703149199486, ref_abs_avg=17.91692543029785, test_abs_avg=17.964027404785156
production_forward2 grad[78] vs paper_forward: mean_abs=0.524732232093811, max_abs=4.5, mean_rel=0.14559641480445862, max_rel=638.9298095703125, norm_rel=0.02412508614361286, ref_abs_avg=21.818984985351562, test_abs_avg=21.819992065429688
production_forward2 grad[79] vs paper_forward: mean_abs=0.48044389486312866, max_abs=4.1875, mean_rel=0.20071706175804138, max_rel=1125.0, norm_rel=0.022407719865441322, ref_abs_avg=21.560178756713867, test_abs_avg=21.563173294067383
production_forward2 grad[80] vs paper_forward: mean_abs=0.3769645690917969, max_abs=1.625, mean_rel=0.06468470394611359, max_rel=1.803127646446228, norm_rel=0.021673493087291718, ref_abs_avg=18.024023056030273, test_abs_avg=18.037023544311523
production_forward2 grad[81] vs paper_forward: mean_abs=0.49180006980895996, max_abs=4.84375, mean_rel=0.14432969689369202, max_rel=1121.783447265625, norm_rel=0.02330990508198738, ref_abs_avg=21.12454605102539, test_abs_avg=21.123668670654297
production_forward2 grad[82] vs paper_forward: mean_abs=0.4443966746330261, max_abs=3.203125, mean_rel=0.21527421474456787, max_rel=1624.9998779296875, norm_rel=0.021130630746483803, ref_abs_avg=20.956138610839844, test_abs_avg=20.95208740234375
production_forward2 grad[83] vs paper_forward: mean_abs=0.368772029876709, max_abs=1.5625, mean_rel=0.0837244987487793, max_rel=4.907351970672607, norm_rel=0.023186633363366127, ref_abs_avg=15.941532135009766, test_abs_avg=15.971899032592773
production_forward2 grad[84] vs paper_forward: mean_abs=0.4601219892501831, max_abs=4.0, mean_rel=0.14064070582389832, max_rel=1190.0057373046875, norm_rel=0.022947795689105988, ref_abs_avg=20.137493133544922, test_abs_avg=20.137907028198242
production_forward2 grad[85] vs paper_forward: mean_abs=0.4225345849990845, max_abs=3.75, mean_rel=0.18413914740085602, max_rel=1328.1248779296875, norm_rel=0.02199808694422245, ref_abs_avg=19.371784210205078, test_abs_avg=19.374025344848633
production_forward2 grad[86] vs paper_forward: mean_abs=0.3517448902130127, max_abs=1.2744140625, mean_rel=0.09210462868213654, max_rel=7.330471992492676, norm_rel=0.021205559372901917, ref_abs_avg=16.784528732299805, test_abs_avg=16.781373977661133
production_forward2 grad[87] vs paper_forward: mean_abs=0.43448683619499207, max_abs=3.875, mean_rel=0.1333296000957489, max_rel=864.9937744140625, norm_rel=0.022045832127332687, ref_abs_avg=19.788219451904297, test_abs_avg=19.79039764404297
production_forward2 grad[88] vs paper_forward: mean_abs=0.3864739239215851, max_abs=3.75, mean_rel=0.17525175213813782, max_rel=1499.9998779296875, norm_rel=0.020230399444699287, ref_abs_avg=19.198863983154297, test_abs_avg=19.20718002319336
production_forward2 grad[89] vs paper_forward: mean_abs=0.3213081359863281, max_abs=1.359375, mean_rel=0.07316631823778152, max_rel=6.462400436401367, norm_rel=0.019496235996484756, ref_abs_avg=16.632568359375, test_abs_avg=16.636791229248047
production_forward2 grad[90] vs paper_forward: mean_abs=0.4033186435699463, max_abs=4.0, mean_rel=0.12766225636005402, max_rel=649.0977783203125, norm_rel=0.021644141525030136, ref_abs_avg=18.789508819580078, test_abs_avg=18.789745330810547
production_forward2 grad[91] vs paper_forward: mean_abs=0.3680455684661865, max_abs=4.0, mean_rel=0.16782580316066742, max_rel=1250.0, norm_rel=0.020260263234376907, ref_abs_avg=18.354978561401367, test_abs_avg=18.351472854614258
production_forward2 grad[92] vs paper_forward: mean_abs=0.3128371238708496, max_abs=1.09375, mean_rel=0.0857120007276535, max_rel=5.531534194946289, norm_rel=0.02111007273197174, ref_abs_avg=14.427091598510742, test_abs_avg=14.43162727355957
production_forward2 grad[93] vs paper_forward: mean_abs=0.3853319585323334, max_abs=3.75, mean_rel=0.12764282524585724, max_rel=581.2577514648438, norm_rel=0.02156950533390045, ref_abs_avg=18.046363830566406, test_abs_avg=18.04928970336914
production_forward2 grad[94] vs paper_forward: mean_abs=0.35809218883514404, max_abs=4.5, mean_rel=0.16735443472862244, max_rel=1499.9998779296875, norm_rel=0.01990479603409767, ref_abs_avg=18.157564163208008, test_abs_avg=18.163421630859375
production_forward2 grad[95] vs paper_forward: mean_abs=0.280862033367157, max_abs=1.0625, mean_rel=0.08195485919713974, max_rel=5.1442437171936035, norm_rel=0.018514219671487808, ref_abs_avg=15.696924209594727, test_abs_avg=15.697011947631836
production_forward2 grad[96] vs paper_forward: mean_abs=0.3729966878890991, max_abs=4.0, mean_rel=0.1235806867480278, max_rel=494.5987854003906, norm_rel=0.021205762401223183, ref_abs_avg=17.851943969726562, test_abs_avg=17.851787567138672
production_forward2 grad[97] vs paper_forward: mean_abs=0.34444737434387207, max_abs=4.0, mean_rel=0.1668352335691452, max_rel=1125.0, norm_rel=0.020398588851094246, ref_abs_avg=17.353809356689453, test_abs_avg=17.356830596923828
identity layers + randn queries
production_forward2 fwd+bwd:  191.546 ms
production_forward2 bwd-only: 172.462 ms
production_forward2 peak allocated: fwd=2.864 GiB, fwd+bwd=6.243 GiB
production_forward2 peak reserved:  fwd=3.230 GiB, fwd+bwd=8.980 GiB
torch_compile_phases_forward fwd+bwd:  168.252 ms
torch_compile_phases_forward bwd-only: 132.529 ms
torch_compile_phases_forward peak allocated: fwd=13.078 GiB, fwd+bwd=13.706 GiB
torch_compile_phases_forward peak reserved:  fwd=13.375 GiB, fwd+bwd=17.627 GiB
paper_forward fwd+bwd:  384.360 ms
paper_forward bwd-only: 304.068 ms
paper_forward peak allocated: fwd=30.003 GiB, fwd+bwd=32.122 GiB
paper_forward peak reserved:  fwd=30.021 GiB, fwd+bwd=32.771 GiB
production_forward fwd+bwd:  114.518 ms
production_forward bwd-only: 95.907 ms
production_forward peak allocated: fwd=3.368 GiB, fwd+bwd=10.493 GiB
production_forward peak reserved:  fwd=3.605 GiB, fwd+bwd=11.605 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001662238035351038, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.008447043597698212, max_abs=0.328125, mean_rel=0.0735897347331047, max_rel=149.51112365722656, norm_rel=0.020210416987538338, ref_abs_avg=0.453069269657135, test_abs_avg=0.4530871510505676
production_forward grad[1] vs paper_forward: mean_abs=7.361815929412842, max_abs=56.0, mean_rel=0.22425825893878937, max_rel=594.4265747070312, norm_rel=0.020986031740903854, ref_abs_avg=317.2802429199219, test_abs_avg=317.2477111816406
production_forward grad[2] vs paper_forward: mean_abs=1.279672622680664, max_abs=5.0, mean_rel=0.0946737751364708, max_rel=2.7279422283172607, norm_rel=0.025753216817975044, ref_abs_avg=48.644569396972656, test_abs_avg=48.674625396728516
production_forward grad[3] vs paper_forward: mean_abs=1.5752453804016113, max_abs=10.0, mean_rel=0.17676421999931335, max_rel=1294.4486083984375, norm_rel=0.02485358528792858, ref_abs_avg=63.85111618041992, test_abs_avg=63.86104202270508
production_forward grad[4] vs paper_forward: mean_abs=1.4668259620666504, max_abs=10.0, mean_rel=0.4720941483974457, max_rel=4687.5, norm_rel=0.023220501840114594, ref_abs_avg=63.49517822265625, test_abs_avg=63.5097770690918
production_forward grad[5] vs paper_forward: mean_abs=1.0483787059783936, max_abs=4.0, mean_rel=0.12361589819192886, max_rel=15.174751281738281, norm_rel=0.022372351959347725, ref_abs_avg=47.90660095214844, test_abs_avg=47.906333923339844
production_forward grad[6] vs paper_forward: mean_abs=1.394273042678833, max_abs=9.0, mean_rel=0.15762633085250854, max_rel=1489.3486328125, norm_rel=0.02465270273387432, ref_abs_avg=56.942420959472656, test_abs_avg=56.94382858276367
production_forward grad[7] vs paper_forward: mean_abs=1.2885675430297852, max_abs=7.5, mean_rel=0.3252441883087158, max_rel=4218.75, norm_rel=0.022941578179597855, ref_abs_avg=56.550445556640625, test_abs_avg=56.550819396972656
production_forward grad[8] vs paper_forward: mean_abs=0.9871206283569336, max_abs=3.5, mean_rel=0.14987590909004211, max_rel=17.260528564453125, norm_rel=0.023590410128235817, ref_abs_avg=41.8624267578125, test_abs_avg=41.95838165283203
production_forward grad[9] vs paper_forward: mean_abs=1.2767372131347656, max_abs=10.0, mean_rel=0.17081403732299805, max_rel=2887.756591796875, norm_rel=0.024436431005597115, ref_abs_avg=52.54780197143555, test_abs_avg=52.554847717285156
production_forward grad[10] vs paper_forward: mean_abs=1.181396484375, max_abs=7.0, mean_rel=0.3562554121017456, max_rel=3015.624755859375, norm_rel=0.022761965170502663, ref_abs_avg=52.1676139831543, test_abs_avg=52.16465759277344
production_forward grad[11] vs paper_forward: mean_abs=0.924163818359375, max_abs=3.5, mean_rel=0.1087440475821495, max_rel=10.846016883850098, norm_rel=0.023918265476822853, ref_abs_avg=39.120994567871094, test_abs_avg=39.088844299316406
production_forward grad[12] vs paper_forward: mean_abs=1.1879878044128418, max_abs=9.0, mean_rel=0.16413545608520508, max_rel=1410.9066162109375, norm_rel=0.024248121306300163, ref_abs_avg=49.30253219604492, test_abs_avg=49.30218505859375
production_forward grad[13] vs paper_forward: mean_abs=1.0913975238800049, max_abs=6.3125, mean_rel=0.3463146686553955, max_rel=3124.999755859375, norm_rel=0.0227816104888916, ref_abs_avg=48.1954460144043, test_abs_avg=48.19544219970703
production_forward grad[14] vs paper_forward: mean_abs=0.8671377301216125, max_abs=3.5625, mean_rel=1.0862563848495483, max_rel=507.6020812988281, norm_rel=0.022638849914073944, ref_abs_avg=37.9934196472168, test_abs_avg=37.89112854003906
production_forward grad[15] vs paper_forward: mean_abs=1.1163220405578613, max_abs=7.0, mean_rel=0.1568654179573059, max_rel=1309.0252685546875, norm_rel=0.02413949742913246, ref_abs_avg=46.49367904663086, test_abs_avg=46.49534606933594
production_forward grad[16] vs paper_forward: mean_abs=1.030570387840271, max_abs=6.5, mean_rel=0.3275974690914154, max_rel=2953.124755859375, norm_rel=0.022471649572253227, ref_abs_avg=46.099632263183594, test_abs_avg=46.10444641113281
production_forward grad[17] vs paper_forward: mean_abs=0.8031797409057617, max_abs=3.03125, mean_rel=0.13551300764083862, max_rel=26.549020767211914, norm_rel=0.023564767092466354, ref_abs_avg=34.75095748901367, test_abs_avg=34.73808288574219
production_forward grad[18] vs paper_forward: mean_abs=1.0470794439315796, max_abs=7.0, mean_rel=0.16176235675811768, max_rel=2055.21630859375, norm_rel=0.024068668484687805, ref_abs_avg=43.780189514160156, test_abs_avg=43.77936935424805
production_forward grad[19] vs paper_forward: mean_abs=0.9667373895645142, max_abs=6.0, mean_rel=0.33387452363967896, max_rel=3187.499755859375, norm_rel=0.022404246032238007, ref_abs_avg=43.366493225097656, test_abs_avg=43.36811828613281
production_forward grad[20] vs paper_forward: mean_abs=0.7906934022903442, max_abs=3.125, mean_rel=0.10672648996114731, max_rel=6.460620880126953, norm_rel=0.025495430454611778, ref_abs_avg=32.223350524902344, test_abs_avg=32.17216873168945
production_forward grad[21] vs paper_forward: mean_abs=0.993181586265564, max_abs=8.0, mean_rel=0.1600819230079651, max_rel=1639.379150390625, norm_rel=0.02382328361272812, ref_abs_avg=41.9398193359375, test_abs_avg=41.943626403808594
production_forward grad[22] vs paper_forward: mean_abs=0.9123709201812744, max_abs=5.6875, mean_rel=0.33790266513824463, max_rel=3624.999755859375, norm_rel=0.02225695736706257, ref_abs_avg=41.16614532470703, test_abs_avg=41.16731643676758
production_forward grad[23] vs paper_forward: mean_abs=0.7018623352050781, max_abs=3.5, mean_rel=0.08209486305713654, max_rel=4.657227993011475, norm_rel=0.023262541741132736, ref_abs_avg=31.013700485229492, test_abs_avg=31.115205764770508
production_forward grad[24] vs paper_forward: mean_abs=0.9483323693275452, max_abs=6.5, mean_rel=0.15394189953804016, max_rel=1040.231201171875, norm_rel=0.023708146065473557, ref_abs_avg=40.20350646972656, test_abs_avg=40.205841064453125
production_forward grad[25] vs paper_forward: mean_abs=0.8740732073783875, max_abs=5.25, mean_rel=0.2767171263694763, max_rel=3687.499755859375, norm_rel=0.022196197882294655, ref_abs_avg=39.53282928466797, test_abs_avg=39.5325813293457
production_forward grad[26] vs paper_forward: mean_abs=0.8444757461547852, max_abs=4.0, mean_rel=0.12580907344818115, max_rel=13.245817184448242, norm_rel=0.025446049869060516, ref_abs_avg=33.07460021972656, test_abs_avg=33.092742919921875
production_forward grad[27] vs paper_forward: mean_abs=1.0940144062042236, max_abs=7.25, mean_rel=0.17918530106544495, max_rel=2263.46435546875, norm_rel=0.025645675137639046, ref_abs_avg=42.867958068847656, test_abs_avg=42.86893081665039
production_forward grad[28] vs paper_forward: mean_abs=1.0268208980560303, max_abs=6.75, mean_rel=0.3772975504398346, max_rel=4187.5, norm_rel=0.024461351335048676, ref_abs_avg=42.2417106628418, test_abs_avg=42.238590240478516
production_forward grad[29] vs paper_forward: mean_abs=0.7742464542388916, max_abs=3.0, mean_rel=0.09784955531358719, max_rel=9.396230697631836, norm_rel=0.023800577968358994, ref_abs_avg=34.24162292480469, test_abs_avg=34.257869720458984
production_forward grad[30] vs paper_forward: mean_abs=1.026776909828186, max_abs=7.5, mean_rel=0.17800670862197876, max_rel=1894.8336181640625, norm_rel=0.02611943520605564, ref_abs_avg=39.50595474243164, test_abs_avg=39.50656509399414
production_forward grad[31] vs paper_forward: mean_abs=0.9604393839836121, max_abs=5.5, mean_rel=0.38447606563568115, max_rel=3374.999755859375, norm_rel=0.024702655151486397, ref_abs_avg=39.086395263671875, test_abs_avg=39.09325408935547
production_forward grad[32] vs paper_forward: mean_abs=0.7504243850708008, max_abs=3.5, mean_rel=0.18214762210845947, max_rel=31.855989456176758, norm_rel=0.02361699938774109, ref_abs_avg=32.08180618286133, test_abs_avg=32.08692932128906
production_forward grad[33] vs paper_forward: mean_abs=0.9503220319747925, max_abs=7.0, mean_rel=0.1734887659549713, max_rel=988.4869995117188, norm_rel=0.025943249464035034, ref_abs_avg=36.81586456298828, test_abs_avg=36.816261291503906
production_forward grad[34] vs paper_forward: mean_abs=0.8905096054077148, max_abs=5.5, mean_rel=0.3057555556297302, max_rel=2968.749755859375, norm_rel=0.024554694071412086, ref_abs_avg=36.39373016357422, test_abs_avg=36.39971923828125
production_forward grad[35] vs paper_forward: mean_abs=0.6915755271911621, max_abs=3.0, mean_rel=0.11776579171419144, max_rel=10.78503131866455, norm_rel=0.02552172541618347, ref_abs_avg=26.675243377685547, test_abs_avg=26.650177001953125
production_forward grad[36] vs paper_forward: mean_abs=0.8900604844093323, max_abs=6.0, mean_rel=0.16696763038635254, max_rel=1453.685546875, norm_rel=0.025743089616298676, ref_abs_avg=34.74600601196289, test_abs_avg=34.74858856201172
production_forward grad[37] vs paper_forward: mean_abs=0.8305395841598511, max_abs=5.0, mean_rel=0.3162837624549866, max_rel=2624.999755859375, norm_rel=0.02433105558156967, ref_abs_avg=34.249664306640625, test_abs_avg=34.2553596496582
production_forward grad[38] vs paper_forward: mean_abs=0.6898727416992188, max_abs=2.65625, mean_rel=0.13097059726715088, max_rel=17.744857788085938, norm_rel=0.024798773229122162, ref_abs_avg=27.392711639404297, test_abs_avg=27.406986236572266
production_forward grad[39] vs paper_forward: mean_abs=0.8420948386192322, max_abs=5.25, mean_rel=0.16306383907794952, max_rel=1268.4434814453125, norm_rel=0.02552054077386856, ref_abs_avg=33.14080810546875, test_abs_avg=33.143375396728516
production_forward grad[40] vs paper_forward: mean_abs=0.7801799178123474, max_abs=5.0, mean_rel=0.27440565824508667, max_rel=2437.5, norm_rel=0.023983106017112732, ref_abs_avg=32.60578536987305, test_abs_avg=32.603050231933594
production_forward grad[41] vs paper_forward: mean_abs=0.6016067266464233, max_abs=2.75, mean_rel=0.23258474469184875, max_rel=85.22616577148438, norm_rel=0.023210935294628143, ref_abs_avg=26.469982147216797, test_abs_avg=26.47356414794922
production_forward grad[42] vs paper_forward: mean_abs=0.7958917617797852, max_abs=5.5, mean_rel=0.16207298636436462, max_rel=953.03125, norm_rel=0.025351883843541145, ref_abs_avg=31.525787353515625, test_abs_avg=31.527584075927734
production_forward grad[43] vs paper_forward: mean_abs=0.7402136325836182, max_abs=4.703125, mean_rel=0.27203643321990967, max_rel=1999.9998779296875, norm_rel=0.02384117804467678, ref_abs_avg=31.14818000793457, test_abs_avg=31.146968841552734
production_forward grad[44] vs paper_forward: mean_abs=0.5778158903121948, max_abs=2.4375, mean_rel=0.08815513551235199, max_rel=8.148832321166992, norm_rel=0.022962698712944984, ref_abs_avg=25.859731674194336, test_abs_avg=25.815874099731445
production_forward grad[45] vs paper_forward: mean_abs=0.7565361261367798, max_abs=5.0, mean_rel=0.1670190989971161, max_rel=1580.2833251953125, norm_rel=0.02507980912923813, ref_abs_avg=30.27716827392578, test_abs_avg=30.27741813659668
production_forward grad[46] vs paper_forward: mean_abs=0.7025048732757568, max_abs=4.75, mean_rel=0.2459389716386795, max_rel=1999.9998779296875, norm_rel=0.023487376049160957, ref_abs_avg=29.96070671081543, test_abs_avg=29.956335067749023
production_forward grad[47] vs paper_forward: mean_abs=0.5642189979553223, max_abs=2.359375, mean_rel=0.0961722880601883, max_rel=6.4298834800720215, norm_rel=0.0226383525878191, ref_abs_avg=24.624263763427734, test_abs_avg=24.611339569091797
production_forward grad[48] vs paper_forward: mean_abs=0.7243286371231079, max_abs=5.0, mean_rel=0.17066287994384766, max_rel=1439.6793212890625, norm_rel=0.024859679862856865, ref_abs_avg=29.195144653320312, test_abs_avg=29.19660186767578
production_forward grad[49] vs paper_forward: mean_abs=0.6753677129745483, max_abs=4.25, mean_rel=0.2773663401603699, max_rel=2187.5, norm_rel=0.02344677969813347, ref_abs_avg=28.843830108642578, test_abs_avg=28.847593307495117
production_forward grad[50] vs paper_forward: mean_abs=0.6154031753540039, max_abs=2.375, mean_rel=0.07541948556900024, max_rel=3.749476671218872, norm_rel=0.024843649938702583, ref_abs_avg=24.53514289855957, test_abs_avg=24.5362548828125
production_forward grad[51] vs paper_forward: mean_abs=0.8075108528137207, max_abs=6.5, mean_rel=0.16643038392066956, max_rel=1911.9591064453125, norm_rel=0.026111947372555733, ref_abs_avg=31.080154418945312, test_abs_avg=31.08003044128418
production_forward grad[52] vs paper_forward: mean_abs=0.7588822245597839, max_abs=5.0, mean_rel=0.2946099638938904, max_rel=2187.5, norm_rel=0.02489362843334675, ref_abs_avg=30.605276107788086, test_abs_avg=30.600563049316406
production_forward grad[53] vs paper_forward: mean_abs=0.5966715812683105, max_abs=3.2421875, mean_rel=0.09433827549219131, max_rel=6.102224349975586, norm_rel=0.025477919727563858, ref_abs_avg=23.50374412536621, test_abs_avg=23.44845962524414
production_forward grad[54] vs paper_forward: mean_abs=0.7498340606689453, max_abs=5.625, mean_rel=0.1607266068458557, max_rel=726.7100830078125, norm_rel=0.025701571255922318, ref_abs_avg=29.233217239379883, test_abs_avg=29.2305850982666
production_forward grad[55] vs paper_forward: mean_abs=0.6944501996040344, max_abs=5.25, mean_rel=0.2805250883102417, max_rel=2999.999755859375, norm_rel=0.024240506812930107, ref_abs_avg=28.75311851501465, test_abs_avg=28.752756118774414
production_forward grad[56] vs paper_forward: mean_abs=0.5642671585083008, max_abs=2.3125, mean_rel=0.09288714826107025, max_rel=6.597476482391357, norm_rel=0.025746028870344162, ref_abs_avg=21.945964813232422, test_abs_avg=21.97616195678711
production_forward grad[57] vs paper_forward: mean_abs=0.695307731628418, max_abs=5.0, mean_rel=0.1681654155254364, max_rel=1189.8272705078125, norm_rel=0.025305625051259995, ref_abs_avg=27.592235565185547, test_abs_avg=27.59188461303711
production_forward grad[58] vs paper_forward: mean_abs=0.6502193808555603, max_abs=4.5, mean_rel=0.2480611801147461, max_rel=2156.25, norm_rel=0.023907620459794998, ref_abs_avg=27.26534080505371, test_abs_avg=27.26416015625
production_forward grad[59] vs paper_forward: mean_abs=0.5303664207458496, max_abs=1.75, mean_rel=0.19269293546676636, max_rel=43.42554473876953, norm_rel=0.02402595616877079, ref_abs_avg=21.736732482910156, test_abs_avg=21.76513671875
production_forward grad[60] vs paper_forward: mean_abs=0.654163122177124, max_abs=5.0, mean_rel=0.15331366658210754, max_rel=734.3910522460938, norm_rel=0.02468162402510643, ref_abs_avg=26.554515838623047, test_abs_avg=26.557598114013672
production_forward grad[61] vs paper_forward: mean_abs=0.6062237620353699, max_abs=4.25, mean_rel=0.248761847615242, max_rel=1562.4998779296875, norm_rel=0.023244645446538925, ref_abs_avg=26.139297485351562, test_abs_avg=26.141616821289062
production_forward grad[62] vs paper_forward: mean_abs=0.4805574417114258, max_abs=1.9375, mean_rel=0.18351298570632935, max_rel=27.924081802368164, norm_rel=0.02430151402950287, ref_abs_avg=20.25870704650879, test_abs_avg=20.21881103515625
production_forward grad[63] vs paper_forward: mean_abs=0.6125812530517578, max_abs=6.0, mean_rel=0.15746860206127167, max_rel=1338.053955078125, norm_rel=0.024449247866868973, ref_abs_avg=25.1188907623291, test_abs_avg=25.119667053222656
production_forward grad[64] vs paper_forward: mean_abs=0.572971761226654, max_abs=4.5, mean_rel=0.2249567061662674, max_rel=2218.75, norm_rel=0.022818531841039658, ref_abs_avg=25.103477478027344, test_abs_avg=25.10862159729004
production_forward grad[65] vs paper_forward: mean_abs=0.45802831649780273, max_abs=1.75, mean_rel=0.1148831769824028, max_rel=19.1016845703125, norm_rel=0.02335597202181816, ref_abs_avg=19.82560920715332, test_abs_avg=19.839317321777344
production_forward grad[66] vs paper_forward: mean_abs=0.5793835520744324, max_abs=4.5, mean_rel=0.15494677424430847, max_rel=969.2265014648438, norm_rel=0.02406969666481018, ref_abs_avg=24.14640235900879, test_abs_avg=24.14522933959961
production_forward grad[67] vs paper_forward: mean_abs=0.5433881282806396, max_abs=3.5, mean_rel=0.22610576450824738, max_rel=1484.3748779296875, norm_rel=0.022409379482269287, ref_abs_avg=24.236614227294922, test_abs_avg=24.24138069152832
production_forward grad[68] vs paper_forward: mean_abs=0.4210672378540039, max_abs=1.75, mean_rel=0.11251042038202286, max_rel=18.96407127380371, norm_rel=0.02188766375184059, ref_abs_avg=19.723663330078125, test_abs_avg=19.72149658203125
production_forward grad[69] vs paper_forward: mean_abs=0.5567810535430908, max_abs=5.0, mean_rel=0.14821189641952515, max_rel=1043.4354248046875, norm_rel=0.023516377434134483, ref_abs_avg=23.66832733154297, test_abs_avg=23.66870880126953
production_forward grad[70] vs paper_forward: mean_abs=0.511101245880127, max_abs=4.0, mean_rel=0.22406229376792908, max_rel=1593.7498779296875, norm_rel=0.022062208503484726, ref_abs_avg=23.237186431884766, test_abs_avg=23.23737144470215
production_forward grad[71] vs paper_forward: mean_abs=0.42208099365234375, max_abs=1.71875, mean_rel=0.12686192989349365, max_rel=21.91844367980957, norm_rel=0.022650044411420822, ref_abs_avg=18.641746520996094, test_abs_avg=18.63016128540039
production_forward grad[72] vs paper_forward: mean_abs=0.5313076972961426, max_abs=4.5, mean_rel=0.1519087702035904, max_rel=1068.4964599609375, norm_rel=0.023327060043811798, ref_abs_avg=22.810489654541016, test_abs_avg=22.8115177154541
production_forward grad[73] vs paper_forward: mean_abs=0.4886992871761322, max_abs=3.5, mean_rel=0.2281571626663208, max_rel=1749.9998779296875, norm_rel=0.021306313574314117, ref_abs_avg=22.852657318115234, test_abs_avg=22.856693267822266
production_forward grad[74] vs paper_forward: mean_abs=0.4632892608642578, max_abs=2.4375, mean_rel=0.10013650357723236, max_rel=7.487711429595947, norm_rel=0.023127585649490356, ref_abs_avg=20.434337615966797, test_abs_avg=20.495723724365234
production_forward grad[75] vs paper_forward: mean_abs=0.6048902869224548, max_abs=4.5, mean_rel=0.155057892203331, max_rel=810.4534912109375, norm_rel=0.024307813495397568, ref_abs_avg=24.97972869873047, test_abs_avg=24.980083465576172
production_forward grad[76] vs paper_forward: mean_abs=0.5577971935272217, max_abs=4.5, mean_rel=0.1944172978401184, max_rel=1890.6248779296875, norm_rel=0.02224687486886978, ref_abs_avg=25.0920467376709, test_abs_avg=25.096256256103516
production_forward grad[77] vs paper_forward: mean_abs=0.4374431371688843, max_abs=2.0625, mean_rel=0.08243371546268463, max_rel=5.123405933380127, norm_rel=0.022829150781035423, ref_abs_avg=19.377443313598633, test_abs_avg=19.38473892211914
production_forward grad[78] vs paper_forward: mean_abs=0.5498284101486206, max_abs=6.0, mean_rel=0.14487944543361664, max_rel=880.48046875, norm_rel=0.02386215329170227, ref_abs_avg=23.10856819152832, test_abs_avg=23.110036849975586
production_forward grad[79] vs paper_forward: mean_abs=0.5128642320632935, max_abs=4.1875, mean_rel=0.18979474902153015, max_rel=1499.9998779296875, norm_rel=0.022090084850788116, ref_abs_avg=23.236553192138672, test_abs_avg=23.237411499023438
production_forward grad[80] vs paper_forward: mean_abs=0.40189456939697266, max_abs=1.65625, mean_rel=0.0964183360338211, max_rel=9.442381858825684, norm_rel=0.021209511905908585, ref_abs_avg=18.730911254882812, test_abs_avg=18.758224487304688
production_forward grad[81] vs paper_forward: mean_abs=0.5113787651062012, max_abs=4.5, mean_rel=0.13982385396957397, max_rel=865.95556640625, norm_rel=0.023163892328739166, ref_abs_avg=22.169921875, test_abs_avg=22.171239852905273
production_forward grad[82] vs paper_forward: mean_abs=0.4706631600856781, max_abs=3.75, mean_rel=0.22481238842010498, max_rel=1593.7498779296875, norm_rel=0.020897986367344856, ref_abs_avg=22.55801773071289, test_abs_avg=22.555702209472656
production_forward grad[83] vs paper_forward: mean_abs=0.3981046676635742, max_abs=1.5, mean_rel=0.11076825112104416, max_rel=15.151947021484375, norm_rel=0.022462494671344757, ref_abs_avg=17.997631072998047, test_abs_avg=17.969633102416992
production_forward grad[84] vs paper_forward: mean_abs=0.4848060607910156, max_abs=4.5, mean_rel=0.14854781329631805, max_rel=718.9747924804688, norm_rel=0.02257947064936161, ref_abs_avg=21.562923431396484, test_abs_avg=21.564237594604492
production_forward grad[85] vs paper_forward: mean_abs=0.43900495767593384, max_abs=3.5625, mean_rel=0.19930142164230347, max_rel=1437.4998779296875, norm_rel=0.02057771198451519, ref_abs_avg=21.369182586669922, test_abs_avg=21.371362686157227
production_forward grad[86] vs paper_forward: mean_abs=0.34452182054519653, max_abs=1.5625, mean_rel=0.4183313846588135, max_rel=166.99822998046875, norm_rel=0.020208677276968956, ref_abs_avg=17.463703155517578, test_abs_avg=17.43436050415039
production_forward grad[87] vs paper_forward: mean_abs=0.4572767913341522, max_abs=7.0, mean_rel=0.1324518620967865, max_rel=843.2081298828125, norm_rel=0.021983753889799118, ref_abs_avg=20.942527770996094, test_abs_avg=20.943950653076172
production_forward grad[88] vs paper_forward: mean_abs=0.4120403528213501, max_abs=3.375, mean_rel=0.17241771519184113, max_rel=1281.25, norm_rel=0.020451141521334648, ref_abs_avg=20.21017074584961, test_abs_avg=20.213542938232422
production_forward grad[89] vs paper_forward: mean_abs=0.340212345123291, max_abs=1.5, mean_rel=0.07410695403814316, max_rel=8.637124061584473, norm_rel=0.021028876304626465, ref_abs_avg=16.982730865478516, test_abs_avg=16.966468811035156
production_forward grad[90] vs paper_forward: mean_abs=0.4239892065525055, max_abs=6.0, mean_rel=0.1323791742324829, max_rel=745.4110107421875, norm_rel=0.02155076526105404, ref_abs_avg=19.855121612548828, test_abs_avg=19.85552978515625
production_forward grad[91] vs paper_forward: mean_abs=0.38353216648101807, max_abs=3.5, mean_rel=0.19843849539756775, max_rel=1281.25, norm_rel=0.019920576363801956, ref_abs_avg=19.42510414123535, test_abs_avg=19.427207946777344
production_forward grad[92] vs paper_forward: mean_abs=0.3185293674468994, max_abs=1.28125, mean_rel=0.09008385241031647, max_rel=8.241053581237793, norm_rel=0.02120351418852806, ref_abs_avg=15.166847229003906, test_abs_avg=15.19051742553711
production_forward grad[93] vs paper_forward: mean_abs=0.40753304958343506, max_abs=4.359375, mean_rel=0.12953820824623108, max_rel=647.845703125, norm_rel=0.021274009719491005, ref_abs_avg=19.407344818115234, test_abs_avg=19.408977508544922
production_forward grad[94] vs paper_forward: mean_abs=0.36719924211502075, max_abs=3.5, mean_rel=0.17725306749343872, max_rel=1312.4998779296875, norm_rel=0.0191023126244545, ref_abs_avg=19.299041748046875, test_abs_avg=19.29595375061035
production_forward grad[95] vs paper_forward: mean_abs=0.3110518455505371, max_abs=1.0, mean_rel=0.0832880288362503, max_rel=6.078653335571289, norm_rel=0.01949404925107956, ref_abs_avg=15.70404052734375, test_abs_avg=15.714378356933594
production_forward grad[96] vs paper_forward: mean_abs=0.3770543932914734, max_abs=4.5, mean_rel=0.12692981958389282, max_rel=933.27783203125, norm_rel=0.020688753575086594, ref_abs_avg=18.546926498413086, test_abs_avg=18.546092987060547
production_forward grad[97] vs paper_forward: mean_abs=0.3506653308868408, max_abs=3.5, mean_rel=0.17946857213974, max_rel=1812.4998779296875, norm_rel=0.019191430881619453, ref_abs_avg=18.506404876708984, test_abs_avg=18.507673263549805
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016658550594002008, max_abs=0.046875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008785944432020187, max_abs=0.3203125, mean_rel=0.07622307538986206, max_rel=108.0760498046875, norm_rel=0.020893927663564682, ref_abs_avg=0.453069269657135, test_abs_avg=0.45307081937789917
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.505645751953125, max_abs=56.0, mean_rel=0.27825048565864563, max_rel=813.8590087890625, norm_rel=0.021346936002373695, ref_abs_avg=317.2802429199219, test_abs_avg=317.2830505371094
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.1964168548583984, max_abs=4.5, mean_rel=0.0890105813741684, max_rel=3.579941511154175, norm_rel=0.02413986250758171, ref_abs_avg=48.644569396972656, test_abs_avg=48.69227981567383
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6234773397445679, max_abs=10.0, mean_rel=0.18062888085842133, max_rel=1781.1180419921875, norm_rel=0.0256055798381567, ref_abs_avg=63.85111618041992, test_abs_avg=63.85887145996094
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.515416145324707, max_abs=10.0, mean_rel=0.4934957027435303, max_rel=6249.99951171875, norm_rel=0.02399578131735325, ref_abs_avg=63.49517822265625, test_abs_avg=63.51080322265625
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1007535457611084, max_abs=4.41845703125, mean_rel=0.14764218032360077, max_rel=23.73189353942871, norm_rel=0.02319829724729061, ref_abs_avg=47.90660095214844, test_abs_avg=47.95809555053711
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4326871633529663, max_abs=10.0, mean_rel=0.16556115448474884, max_rel=2037.646484375, norm_rel=0.025325115770101547, ref_abs_avg=56.942420959472656, test_abs_avg=56.9400634765625
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3299002647399902, max_abs=8.5, mean_rel=0.31771761178970337, max_rel=3749.999755859375, norm_rel=0.023661281913518906, ref_abs_avg=56.550445556640625, test_abs_avg=56.55378723144531
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0031940937042236, max_abs=4.375, mean_rel=0.09097374975681305, max_rel=4.909365177154541, norm_rel=0.023750320076942444, ref_abs_avg=41.8624267578125, test_abs_avg=41.89807891845703
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3105064630508423, max_abs=9.0, mean_rel=0.17910803854465485, max_rel=3115.2421875, norm_rel=0.025065110996365547, ref_abs_avg=52.54780197143555, test_abs_avg=52.54989242553711
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.212517261505127, max_abs=7.0, mean_rel=0.35292959213256836, max_rel=4125.0, norm_rel=0.023351725190877914, ref_abs_avg=52.1676139831543, test_abs_avg=52.1625862121582
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9281048774719238, max_abs=4.0, mean_rel=0.11228062212467194, max_rel=10.18490982055664, norm_rel=0.023952385410666466, ref_abs_avg=39.120994567871094, test_abs_avg=39.14237976074219
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2189425230026245, max_abs=8.0, mean_rel=0.17656677961349487, max_rel=1625.45556640625, norm_rel=0.02488267607986927, ref_abs_avg=49.30253219604492, test_abs_avg=49.30083465576172
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1234421730041504, max_abs=6.5, mean_rel=0.3562557101249695, max_rel=3156.249755859375, norm_rel=0.023469630628824234, ref_abs_avg=48.1954460144043, test_abs_avg=48.1986083984375
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9072807431221008, max_abs=4.25, mean_rel=0.6175199151039124, max_rel=273.3587341308594, norm_rel=0.023854652419686317, ref_abs_avg=37.9934196472168, test_abs_avg=37.88486099243164
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1433733701705933, max_abs=7.0, mean_rel=0.16100165247917175, max_rel=1825.0472412109375, norm_rel=0.024738317355513573, ref_abs_avg=46.49367904663086, test_abs_avg=46.49162292480469
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0584168434143066, max_abs=6.75, mean_rel=0.34110361337661743, max_rel=3156.249755859375, norm_rel=0.02306593768298626, ref_abs_avg=46.099632263183594, test_abs_avg=46.101139068603516
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8488786220550537, max_abs=2.875, mean_rel=0.16282570362091064, max_rel=17.672483444213867, norm_rel=0.024591295048594475, ref_abs_avg=34.75095748901367, test_abs_avg=34.70992660522461
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0722529888153076, max_abs=7.0, mean_rel=0.16653059422969818, max_rel=2128.093994140625, norm_rel=0.02462291158735752, ref_abs_avg=43.780189514160156, test_abs_avg=43.77867889404297
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9915816187858582, max_abs=6.5, mean_rel=0.3392045795917511, max_rel=3624.999755859375, norm_rel=0.02298753894865513, ref_abs_avg=43.366493225097656, test_abs_avg=43.36707305908203
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.830873966217041, max_abs=3.5, mean_rel=0.1128016859292984, max_rel=9.418721199035645, norm_rel=0.02645287476480007, ref_abs_avg=32.223350524902344, test_abs_avg=32.17948913574219
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0134227275848389, max_abs=7.0, mean_rel=0.16855216026306152, max_rel=1822.917236328125, norm_rel=0.02429608255624771, ref_abs_avg=41.9398193359375, test_abs_avg=41.94328308105469
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9334071278572083, max_abs=5.5, mean_rel=0.36338382959365845, max_rel=3874.999755859375, norm_rel=0.022784490138292313, ref_abs_avg=41.16614532470703, test_abs_avg=41.16552734375
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.700592041015625, max_abs=3.5, mean_rel=0.08031072467565536, max_rel=5.5996317863464355, norm_rel=0.02323714829981327, ref_abs_avg=31.013700485229492, test_abs_avg=31.074209213256836
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9672026634216309, max_abs=7.0, mean_rel=0.1607007086277008, max_rel=1429.5802001953125, norm_rel=0.024183064699172974, ref_abs_avg=40.20350646972656, test_abs_avg=40.205387115478516
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.891511082649231, max_abs=5.25, mean_rel=0.27242350578308105, max_rel=3124.999755859375, norm_rel=0.022631557658314705, ref_abs_avg=39.53282928466797, test_abs_avg=39.531925201416016
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8787784576416016, max_abs=3.3125, mean_rel=0.19272397458553314, max_rel=28.06122589111328, norm_rel=0.025979777798056602, ref_abs_avg=33.07460021972656, test_abs_avg=33.10902404785156
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1196284294128418, max_abs=9.0, mean_rel=0.1854991912841797, max_rel=1636.4605712890625, norm_rel=0.02620859444141388, ref_abs_avg=42.867958068847656, test_abs_avg=42.86677932739258
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0495600700378418, max_abs=6.5, mean_rel=0.39597368240356445, max_rel=3796.874755859375, norm_rel=0.025003693997859955, ref_abs_avg=42.2417106628418, test_abs_avg=42.235557556152344
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7683753967285156, max_abs=3.0, mean_rel=0.07858945429325104, max_rel=5.125515937805176, norm_rel=0.023638447746634483, ref_abs_avg=34.24162292480469, test_abs_avg=34.25535583496094
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0494697093963623, max_abs=7.03125, mean_rel=0.18394172191619873, max_rel=3055.969970703125, norm_rel=0.0266804788261652, ref_abs_avg=39.50595474243164, test_abs_avg=39.50529861450195
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9800440669059753, max_abs=6.0, mean_rel=0.39318394660949707, max_rel=3249.999755859375, norm_rel=0.02520289272069931, ref_abs_avg=39.086395263671875, test_abs_avg=39.09206771850586
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7735960483551025, max_abs=3.0, mean_rel=0.193558931350708, max_rel=27.444629669189453, norm_rel=0.024337729439139366, ref_abs_avg=32.08180618286133, test_abs_avg=32.06055450439453
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9688276052474976, max_abs=7.0, mean_rel=0.1766461431980133, max_rel=1733.49853515625, norm_rel=0.026436835527420044, ref_abs_avg=36.81586456298828, test_abs_avg=36.81581115722656
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.907617449760437, max_abs=5.5, mean_rel=0.3084510564804077, max_rel=2406.25, norm_rel=0.025035152211785316, ref_abs_avg=36.39373016357422, test_abs_avg=36.395904541015625
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7127029895782471, max_abs=3.25, mean_rel=0.12595102190971375, max_rel=10.716959953308105, norm_rel=0.026570821180939674, ref_abs_avg=26.675243377685547, test_abs_avg=26.6855411529541
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.9057362079620361, max_abs=6.0, mean_rel=0.17163196206092834, max_rel=1266.2510986328125, norm_rel=0.02619144879281521, ref_abs_avg=34.74600601196289, test_abs_avg=34.749752044677734
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8488084077835083, max_abs=5.375, mean_rel=0.32542526721954346, max_rel=2749.999755859375, norm_rel=0.024857575073838234, ref_abs_avg=34.249664306640625, test_abs_avg=34.25593566894531
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6508355140686035, max_abs=2.375, mean_rel=0.11261829733848572, max_rel=10.431617736816406, norm_rel=0.02429009974002838, ref_abs_avg=27.392711639404297, test_abs_avg=27.405487060546875
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8556903600692749, max_abs=6.0, mean_rel=0.16803398728370667, max_rel=1473.8267822265625, norm_rel=0.02592111937701702, ref_abs_avg=33.14080810546875, test_abs_avg=33.14323425292969
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7946057319641113, max_abs=5.375, mean_rel=0.27297383546829224, max_rel=2828.124755859375, norm_rel=0.02443930320441723, ref_abs_avg=32.60578536987305, test_abs_avg=32.603050231933594
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6226404905319214, max_abs=2.75, mean_rel=0.4569177031517029, max_rel=194.25379943847656, norm_rel=0.023514274507761, ref_abs_avg=26.469982147216797, test_abs_avg=26.47345733642578
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8072212338447571, max_abs=5.25, mean_rel=0.16784287989139557, max_rel=903.15966796875, norm_rel=0.025703132152557373, ref_abs_avg=31.525787353515625, test_abs_avg=31.527647018432617
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7500486373901367, max_abs=4.75, mean_rel=0.27695539593696594, max_rel=2375.0, norm_rel=0.02415655180811882, ref_abs_avg=31.14818000793457, test_abs_avg=31.14651107788086
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5699520111083984, max_abs=2.5, mean_rel=0.07304460555315018, max_rel=2.101271152496338, norm_rel=0.022588608786463737, ref_abs_avg=25.859731674194336, test_abs_avg=25.819324493408203
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7663595080375671, max_abs=6.0, mean_rel=0.16806480288505554, max_rel=1565.5836181640625, norm_rel=0.025402357801795006, ref_abs_avg=30.27716827392578, test_abs_avg=30.27712631225586
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7126901149749756, max_abs=5.0, mean_rel=0.24967890977859497, max_rel=1999.9998779296875, norm_rel=0.023840848356485367, ref_abs_avg=29.96070671081543, test_abs_avg=29.956344604492188
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5877120494842529, max_abs=2.265625, mean_rel=0.11445806920528412, max_rel=13.377350807189941, norm_rel=0.023502439260482788, ref_abs_avg=24.624263763427734, test_abs_avg=24.59427261352539
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7328282594680786, max_abs=5.0, mean_rel=0.1714896559715271, max_rel=1385.854248046875, norm_rel=0.025159036740660667, ref_abs_avg=29.195144653320312, test_abs_avg=29.196176528930664
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6856685876846313, max_abs=5.0, mean_rel=0.2734902501106262, max_rel=2437.5, norm_rel=0.023783953860402107, ref_abs_avg=28.843830108642578, test_abs_avg=28.847328186035156
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6241664886474609, max_abs=2.25, mean_rel=0.07711520791053772, max_rel=2.679858684539795, norm_rel=0.02497248165309429, ref_abs_avg=24.53514289855957, test_abs_avg=24.55322265625
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8194262981414795, max_abs=6.0, mean_rel=0.17370004951953888, max_rel=1233.022705078125, norm_rel=0.026475364342331886, ref_abs_avg=31.080154418945312, test_abs_avg=31.07878303527832
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7718082070350647, max_abs=5.0, mean_rel=0.31406480073928833, max_rel=2093.75, norm_rel=0.02531689777970314, ref_abs_avg=30.605276107788086, test_abs_avg=30.599895477294922
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6253665089607239, max_abs=2.7734375, mean_rel=0.09879297018051147, max_rel=6.364450931549072, norm_rel=0.026687245815992355, ref_abs_avg=23.50374412536621, test_abs_avg=23.45656967163086
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7592469453811646, max_abs=6.0, mean_rel=0.1675335019826889, max_rel=1044.6954345703125, norm_rel=0.026034867390990257, ref_abs_avg=29.233217239379883, test_abs_avg=29.23179817199707
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.705633819103241, max_abs=4.484375, mean_rel=0.2969015836715698, max_rel=2812.499755859375, norm_rel=0.02464551478624344, ref_abs_avg=28.75311851501465, test_abs_avg=28.751008987426758
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5622682571411133, max_abs=2.0, mean_rel=0.10082449018955231, max_rel=7.357911109924316, norm_rel=0.025605343282222748, ref_abs_avg=21.945964813232422, test_abs_avg=21.961101531982422
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7045300006866455, max_abs=5.0, mean_rel=0.17237235605716705, max_rel=1661.5784912109375, norm_rel=0.02563386969268322, ref_abs_avg=27.592235565185547, test_abs_avg=27.59225845336914
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6606683135032654, max_abs=4.5, mean_rel=0.2713063955307007, max_rel=2546.875, norm_rel=0.02426709420979023, ref_abs_avg=27.26534080505371, test_abs_avg=27.264873504638672
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5343914031982422, max_abs=2.0, mean_rel=0.18976765871047974, max_rel=46.529449462890625, norm_rel=0.02431056648492813, ref_abs_avg=21.736732482910156, test_abs_avg=21.76719856262207
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6606652736663818, max_abs=4.5, mean_rel=0.15602444112300873, max_rel=627.8009033203125, norm_rel=0.024938279762864113, ref_abs_avg=26.554515838623047, test_abs_avg=26.558263778686523
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6151285171508789, max_abs=4.875, mean_rel=0.2590393126010895, max_rel=1921.8748779296875, norm_rel=0.023579522967338562, ref_abs_avg=26.139297485351562, test_abs_avg=26.13976287841797
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.47362327575683594, max_abs=2.0, mean_rel=0.14575661718845367, max_rel=18.933889389038086, norm_rel=0.02441825345158577, ref_abs_avg=20.25870704650879, test_abs_avg=20.2143611907959
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6184418201446533, max_abs=5.0, mean_rel=0.15821301937103271, max_rel=1179.426513671875, norm_rel=0.024680403992533684, ref_abs_avg=25.1188907623291, test_abs_avg=25.118471145629883
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.576947808265686, max_abs=4.125, mean_rel=0.23428234457969666, max_rel=2624.999755859375, norm_rel=0.022971663624048233, ref_abs_avg=25.103477478027344, test_abs_avg=25.105928421020508
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4619014263153076, max_abs=1.75, mean_rel=0.11730287224054337, max_rel=18.339998245239258, norm_rel=0.02314102277159691, ref_abs_avg=19.82560920715332, test_abs_avg=19.83447265625
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5843621492385864, max_abs=4.5, mean_rel=0.15645572543144226, max_rel=1030.904541015625, norm_rel=0.024278415367007256, ref_abs_avg=24.14640235900879, test_abs_avg=24.146865844726562
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5469931960105896, max_abs=3.5, mean_rel=0.24145880341529846, max_rel=1624.9998779296875, norm_rel=0.02256391942501068, ref_abs_avg=24.236614227294922, test_abs_avg=24.24161148071289
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43999338150024414, max_abs=2.0, mean_rel=0.1010700911283493, max_rel=7.514066219329834, norm_rel=0.02323880046606064, ref_abs_avg=19.723663330078125, test_abs_avg=19.72248649597168
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5623099207878113, max_abs=4.0, mean_rel=0.14351806044578552, max_rel=589.5296630859375, norm_rel=0.023743988946080208, ref_abs_avg=23.66832733154297, test_abs_avg=23.669692993164062
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5166826248168945, max_abs=4.0, mean_rel=0.22603680193424225, max_rel=1593.7498779296875, norm_rel=0.02231213077902794, ref_abs_avg=23.237186431884766, test_abs_avg=23.235065460205078
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4294954538345337, max_abs=2.0, mean_rel=0.15331563353538513, max_rel=37.8281364440918, norm_rel=0.02316875383257866, ref_abs_avg=18.641746520996094, test_abs_avg=18.644039154052734
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5356115102767944, max_abs=5.0, mean_rel=0.1561196744441986, max_rel=1340.263427734375, norm_rel=0.02349974773824215, ref_abs_avg=22.810489654541016, test_abs_avg=22.81112289428711
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.49026721715927124, max_abs=4.0, mean_rel=0.22097493708133698, max_rel=1593.7498779296875, norm_rel=0.02141798846423626, ref_abs_avg=22.852657318115234, test_abs_avg=22.859607696533203
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4719700813293457, max_abs=1.8125, mean_rel=0.10218772292137146, max_rel=6.050570487976074, norm_rel=0.02348402515053749, ref_abs_avg=20.434337615966797, test_abs_avg=20.45995330810547
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.612708330154419, max_abs=4.5, mean_rel=0.15543586015701294, max_rel=1047.662353515625, norm_rel=0.024610323831439018, ref_abs_avg=24.97972869873047, test_abs_avg=24.980003356933594
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5661028623580933, max_abs=4.0625, mean_rel=0.211370050907135, max_rel=1609.3748779296875, norm_rel=0.022597339004278183, ref_abs_avg=25.0920467376709, test_abs_avg=25.09946060180664
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4484734535217285, max_abs=2.0625, mean_rel=0.07916611433029175, max_rel=6.217310905456543, norm_rel=0.023106854408979416, ref_abs_avg=19.377443313598633, test_abs_avg=19.374897003173828
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5551256537437439, max_abs=4.625, mean_rel=0.14454400539398193, max_rel=1100.7200927734375, norm_rel=0.02407698519527912, ref_abs_avg=23.10856819152832, test_abs_avg=23.10973358154297
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5189666748046875, max_abs=4.25, mean_rel=0.2110566794872284, max_rel=1437.4998779296875, norm_rel=0.022355906665325165, ref_abs_avg=23.236553192138672, test_abs_avg=23.234094619750977
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4027712345123291, max_abs=2.0625, mean_rel=0.10875628888607025, max_rel=19.295303344726562, norm_rel=0.021850556135177612, ref_abs_avg=18.730911254882812, test_abs_avg=18.75945472717285
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.516309916973114, max_abs=4.125, mean_rel=0.14158867299556732, max_rel=696.6373901367188, norm_rel=0.023359986022114754, ref_abs_avg=22.169921875, test_abs_avg=22.171566009521484
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4797514081001282, max_abs=4.0, mean_rel=0.22525227069854736, max_rel=1562.4998779296875, norm_rel=0.021312473341822624, ref_abs_avg=22.55801773071289, test_abs_avg=22.556106567382812
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3981752395629883, max_abs=1.6875, mean_rel=0.11764930933713913, max_rel=18.236604690551758, norm_rel=0.022904347628355026, ref_abs_avg=17.997631072998047, test_abs_avg=17.962116241455078
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4878723919391632, max_abs=4.375, mean_rel=0.148182675242424, max_rel=646.8098754882812, norm_rel=0.022707486525177956, ref_abs_avg=21.562923431396484, test_abs_avg=21.56468963623047
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.442178875207901, max_abs=3.5, mean_rel=0.19625608623027802, max_rel=1250.0, norm_rel=0.020690614357590675, ref_abs_avg=21.369182586669922, test_abs_avg=21.37392234802246
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.35361719131469727, max_abs=1.5, mean_rel=0.4444853663444519, max_rel=178.4924774169922, norm_rel=0.02072620950639248, ref_abs_avg=17.463703155517578, test_abs_avg=17.4383544921875
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.46039149165153503, max_abs=7.5, mean_rel=0.1331903636455536, max_rel=625.0929565429688, norm_rel=0.022108687087893486, ref_abs_avg=20.942527770996094, test_abs_avg=20.944528579711914
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.41332414746284485, max_abs=3.75, mean_rel=0.17705637216567993, max_rel=1468.7498779296875, norm_rel=0.020557299256324768, ref_abs_avg=20.21017074584961, test_abs_avg=20.215377807617188
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3318214416503906, max_abs=1.375, mean_rel=0.10603035986423492, max_rel=10.865541458129883, norm_rel=0.020381541922688484, ref_abs_avg=16.982730865478516, test_abs_avg=16.987171173095703
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4263569712638855, max_abs=4.5, mean_rel=0.13352033495903015, max_rel=1124.738037109375, norm_rel=0.021642539650201797, ref_abs_avg=19.855121612548828, test_abs_avg=19.85611915588379
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3902033567428589, max_abs=3.5, mean_rel=0.20097601413726807, max_rel=1031.25, norm_rel=0.02028333581984043, ref_abs_avg=19.42510414123535, test_abs_avg=19.423147201538086
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3247535228729248, max_abs=1.25, mean_rel=0.0832744911313057, max_rel=6.703065395355225, norm_rel=0.021278398111462593, ref_abs_avg=15.166847229003906, test_abs_avg=15.196508407592773
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.4087149500846863, max_abs=4.0, mean_rel=0.12946993112564087, max_rel=766.7313842773438, norm_rel=0.021329041570425034, ref_abs_avg=19.407344818115234, test_abs_avg=19.409366607666016
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.36462241411209106, max_abs=3.6875, mean_rel=0.17844998836517334, max_rel=999.9999389648438, norm_rel=0.018982229754328728, ref_abs_avg=19.299041748046875, test_abs_avg=19.297170639038086
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.30170726776123047, max_abs=1.125, mean_rel=0.08554646372795105, max_rel=7.91775369644165, norm_rel=0.01928665116429329, ref_abs_avg=15.70404052734375, test_abs_avg=15.726351737976074
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3771675229072571, max_abs=4.25, mean_rel=0.12619055807590485, max_rel=663.5068359375, norm_rel=0.020709658041596413, ref_abs_avg=18.546926498413086, test_abs_avg=18.547082901000977
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.35387277603149414, max_abs=4.0, mean_rel=0.18229267001152039, max_rel=1562.4998779296875, norm_rel=0.01951632834970951, ref_abs_avg=18.506404876708984, test_abs_avg=18.505775451660156
production_forward2 vs paper_forward output: mean_abs=0.001662238035351038, max_abs=0.046875
production_forward2 grad[0] vs paper_forward: mean_abs=0.008774567395448685, max_abs=0.3046875, mean_rel=0.07610750943422318, max_rel=126.51228332519531, norm_rel=0.020872846245765686, ref_abs_avg=0.453069269657135, test_abs_avg=0.4530709683895111
production_forward2 grad[1] vs paper_forward: mean_abs=7.4962873458862305, max_abs=56.0, mean_rel=0.25031840801239014, max_rel=636.6396484375, norm_rel=0.021285424008965492, ref_abs_avg=317.2802429199219, test_abs_avg=317.2536926269531
production_forward2 grad[2] vs paper_forward: mean_abs=1.2485713958740234, max_abs=5.0, mean_rel=0.09703462570905685, max_rel=4.505438327789307, norm_rel=0.025124818086624146, ref_abs_avg=48.644569396972656, test_abs_avg=48.686153411865234
production_forward2 grad[3] vs paper_forward: mean_abs=1.6195881366729736, max_abs=12.0, mean_rel=0.18286897242069244, max_rel=1650.5482177734375, norm_rel=0.025555651634931564, ref_abs_avg=63.85111618041992, test_abs_avg=63.856258392333984
production_forward2 grad[4] vs paper_forward: mean_abs=1.509764552116394, max_abs=11.0, mean_rel=0.4912821054458618, max_rel=5125.0, norm_rel=0.023921385407447815, ref_abs_avg=63.49517822265625, test_abs_avg=63.50872039794922
production_forward2 grad[5] vs paper_forward: mean_abs=1.058717131614685, max_abs=3.875, mean_rel=0.1411130130290985, max_rel=25.278364181518555, norm_rel=0.022701261565089226, ref_abs_avg=47.90660095214844, test_abs_avg=47.89292907714844
production_forward2 grad[6] vs paper_forward: mean_abs=1.4309202432632446, max_abs=9.5, mean_rel=0.1663910150527954, max_rel=1942.716064453125, norm_rel=0.025288481265306473, ref_abs_avg=56.942420959472656, test_abs_avg=56.9410400390625
production_forward2 grad[7] vs paper_forward: mean_abs=1.3330433368682861, max_abs=8.25, mean_rel=0.3435145616531372, max_rel=4125.0, norm_rel=0.023713210597634315, ref_abs_avg=56.550445556640625, test_abs_avg=56.547637939453125
production_forward2 grad[8] vs paper_forward: mean_abs=0.9934583902359009, max_abs=4.125, mean_rel=0.09696771204471588, max_rel=4.19839334487915, norm_rel=0.023759610950946808, ref_abs_avg=41.8624267578125, test_abs_avg=41.9603271484375
production_forward2 grad[9] vs paper_forward: mean_abs=1.3090201616287231, max_abs=9.0, mean_rel=0.1786416471004486, max_rel=2769.76806640625, norm_rel=0.025023888796567917, ref_abs_avg=52.54780197143555, test_abs_avg=52.55126190185547
production_forward2 grad[10] vs paper_forward: mean_abs=1.216882348060608, max_abs=7.0, mean_rel=0.3605296313762665, max_rel=4218.75, norm_rel=0.023443259298801422, ref_abs_avg=52.1676139831543, test_abs_avg=52.16362380981445
production_forward2 grad[11] vs paper_forward: mean_abs=0.9323010444641113, max_abs=3.5, mean_rel=0.0994783267378807, max_rel=9.812857627868652, norm_rel=0.02390165999531746, ref_abs_avg=39.120994567871094, test_abs_avg=39.119659423828125
production_forward2 grad[12] vs paper_forward: mean_abs=1.2168668508529663, max_abs=8.0, mean_rel=0.17166002094745636, max_rel=1113.7265625, norm_rel=0.024846524000167847, ref_abs_avg=49.30253219604492, test_abs_avg=49.3004150390625
production_forward2 grad[13] vs paper_forward: mean_abs=1.1245534420013428, max_abs=6.75, mean_rel=0.3575914204120636, max_rel=2999.999755859375, norm_rel=0.023479340597987175, ref_abs_avg=48.1954460144043, test_abs_avg=48.19776153564453
production_forward2 grad[14] vs paper_forward: mean_abs=0.8758209347724915, max_abs=3.5, mean_rel=0.6509202122688293, max_rel=288.09100341796875, norm_rel=0.02302473597228527, ref_abs_avg=37.9934196472168, test_abs_avg=37.884979248046875
production_forward2 grad[15] vs paper_forward: mean_abs=1.1439075469970703, max_abs=8.0, mean_rel=0.1622237265110016, max_rel=1704.026123046875, norm_rel=0.02473234012722969, ref_abs_avg=46.49367904663086, test_abs_avg=46.49314498901367
production_forward2 grad[16] vs paper_forward: mean_abs=1.056997299194336, max_abs=6.5, mean_rel=0.33967721462249756, max_rel=2999.999755859375, norm_rel=0.02303379587829113, ref_abs_avg=46.099632263183594, test_abs_avg=46.10283660888672
production_forward2 grad[17] vs paper_forward: mean_abs=0.8206167221069336, max_abs=2.75, mean_rel=0.15731535851955414, max_rel=25.03778076171875, norm_rel=0.023740282282233238, ref_abs_avg=34.75095748901367, test_abs_avg=34.72051239013672
production_forward2 grad[18] vs paper_forward: mean_abs=1.0708380937576294, max_abs=7.0, mean_rel=0.16613733768463135, max_rel=2536.208740234375, norm_rel=0.024594556540250778, ref_abs_avg=43.780189514160156, test_abs_avg=43.77897262573242
production_forward2 grad[19] vs paper_forward: mean_abs=0.9898890852928162, max_abs=6.125, mean_rel=0.33655548095703125, max_rel=3499.999755859375, norm_rel=0.022941330447793007, ref_abs_avg=43.366493225097656, test_abs_avg=43.36854934692383
production_forward2 grad[20] vs paper_forward: mean_abs=0.8039340972900391, max_abs=3.0625, mean_rel=0.11866971850395203, max_rel=8.72859001159668, norm_rel=0.02581961825489998, ref_abs_avg=32.223350524902344, test_abs_avg=32.199371337890625
production_forward2 grad[21] vs paper_forward: mean_abs=1.0126996040344238, max_abs=6.0, mean_rel=0.1651509702205658, max_rel=1624.96435546875, norm_rel=0.024277301505208015, ref_abs_avg=41.9398193359375, test_abs_avg=41.941932678222656
production_forward2 grad[22] vs paper_forward: mean_abs=0.9334821701049805, max_abs=5.5, mean_rel=0.33567917346954346, max_rel=3031.249755859375, norm_rel=0.022775905206799507, ref_abs_avg=41.16614532470703, test_abs_avg=41.16722869873047
production_forward2 grad[23] vs paper_forward: mean_abs=0.7542695999145508, max_abs=3.0, mean_rel=0.08369041234254837, max_rel=5.117471694946289, norm_rel=0.024014854803681374, ref_abs_avg=31.013700485229492, test_abs_avg=31.086807250976562
production_forward2 grad[24] vs paper_forward: mean_abs=0.9659126400947571, max_abs=8.0, mean_rel=0.15711376070976257, max_rel=1166.5390625, norm_rel=0.02415931224822998, ref_abs_avg=40.20350646972656, test_abs_avg=40.2047233581543
production_forward2 grad[25] vs paper_forward: mean_abs=0.8910778164863586, max_abs=5.25, mean_rel=0.26499509811401367, max_rel=3374.999755859375, norm_rel=0.022648947313427925, ref_abs_avg=39.53282928466797, test_abs_avg=39.531883239746094
production_forward2 grad[26] vs paper_forward: mean_abs=0.8832855224609375, max_abs=4.0, mean_rel=0.17620950937271118, max_rel=21.716825485229492, norm_rel=0.026319267228245735, ref_abs_avg=33.07460021972656, test_abs_avg=33.0993537902832
production_forward2 grad[27] vs paper_forward: mean_abs=1.1172094345092773, max_abs=8.0, mean_rel=0.1892322152853012, max_rel=2249.49169921875, norm_rel=0.026173224672675133, ref_abs_avg=42.867958068847656, test_abs_avg=42.86885070800781
production_forward2 grad[28] vs paper_forward: mean_abs=1.0467329025268555, max_abs=7.25, mean_rel=0.3764685094356537, max_rel=3703.124755859375, norm_rel=0.024955345317721367, ref_abs_avg=42.2417106628418, test_abs_avg=42.23530578613281
production_forward2 grad[29] vs paper_forward: mean_abs=0.7819701433181763, max_abs=3.0, mean_rel=0.09392687678337097, max_rel=8.084576606750488, norm_rel=0.02472143992781639, ref_abs_avg=34.24162292480469, test_abs_avg=34.249237060546875
production_forward2 grad[30] vs paper_forward: mean_abs=1.0474156141281128, max_abs=7.0, mean_rel=0.18318702280521393, max_rel=2933.7451171875, norm_rel=0.02662748470902443, ref_abs_avg=39.50595474243164, test_abs_avg=39.504947662353516
production_forward2 grad[31] vs paper_forward: mean_abs=0.9803417921066284, max_abs=6.0, mean_rel=0.3795502185821533, max_rel=3374.999755859375, norm_rel=0.02521407976746559, ref_abs_avg=39.086395263671875, test_abs_avg=39.094051361083984
production_forward2 grad[32] vs paper_forward: mean_abs=0.768444299697876, max_abs=3.0, mean_rel=0.15668869018554688, max_rel=20.5518798828125, norm_rel=0.02408433146774769, ref_abs_avg=32.08180618286133, test_abs_avg=32.07012939453125
production_forward2 grad[33] vs paper_forward: mean_abs=0.9677977561950684, max_abs=7.0, mean_rel=0.1758577525615692, max_rel=1888.4019775390625, norm_rel=0.026400748640298843, ref_abs_avg=36.81586456298828, test_abs_avg=36.816837310791016
production_forward2 grad[34] vs paper_forward: mean_abs=0.9050129652023315, max_abs=5.25, mean_rel=0.3161202073097229, max_rel=2718.749755859375, norm_rel=0.024955879896879196, ref_abs_avg=36.39373016357422, test_abs_avg=36.397037506103516
production_forward2 grad[35] vs paper_forward: mean_abs=0.6865363121032715, max_abs=2.875, mean_rel=0.13151346147060394, max_rel=12.214526176452637, norm_rel=0.025811389088630676, ref_abs_avg=26.675243377685547, test_abs_avg=26.6827392578125
production_forward2 grad[36] vs paper_forward: mean_abs=0.904548168182373, max_abs=6.5, mean_rel=0.16853556036949158, max_rel=1207.3310546875, norm_rel=0.026149647310376167, ref_abs_avg=34.74600601196289, test_abs_avg=34.747535705566406
production_forward2 grad[37] vs paper_forward: mean_abs=0.8473306894302368, max_abs=5.5, mean_rel=0.31973087787628174, max_rel=2500.0, norm_rel=0.024824609979987144, ref_abs_avg=34.249664306640625, test_abs_avg=34.25432586669922
production_forward2 grad[38] vs paper_forward: mean_abs=0.683037519454956, max_abs=2.625, mean_rel=0.1144685298204422, max_rel=12.687103271484375, norm_rel=0.02455914206802845, ref_abs_avg=27.392711639404297, test_abs_avg=27.416589736938477
production_forward2 grad[39] vs paper_forward: mean_abs=0.8539028167724609, max_abs=6.0, mean_rel=0.1686633974313736, max_rel=1788.5645751953125, norm_rel=0.025871792808175087, ref_abs_avg=33.14080810546875, test_abs_avg=33.142913818359375
production_forward2 grad[40] vs paper_forward: mean_abs=0.7929452657699585, max_abs=5.5, mean_rel=0.29295194149017334, max_rel=2843.749755859375, norm_rel=0.024383632466197014, ref_abs_avg=32.60578536987305, test_abs_avg=32.60224533081055
production_forward2 grad[41] vs paper_forward: mean_abs=0.60741126537323, max_abs=2.75, mean_rel=0.3246203362941742, max_rel=130.0744171142578, norm_rel=0.02341766655445099, ref_abs_avg=26.469982147216797, test_abs_avg=26.44896125793457
production_forward2 grad[42] vs paper_forward: mean_abs=0.8062849044799805, max_abs=5.5, mean_rel=0.16548332571983337, max_rel=852.6744995117188, norm_rel=0.025681747123599052, ref_abs_avg=31.525787353515625, test_abs_avg=31.52764892578125
production_forward2 grad[43] vs paper_forward: mean_abs=0.7504590749740601, max_abs=5.0, mean_rel=0.27624037861824036, max_rel=2281.25, norm_rel=0.024149462580680847, ref_abs_avg=31.14818000793457, test_abs_avg=31.145389556884766
production_forward2 grad[44] vs paper_forward: mean_abs=0.5923426151275635, max_abs=2.3125, mean_rel=0.07176943868398666, max_rel=2.427766799926758, norm_rel=0.02345244213938713, ref_abs_avg=25.859731674194336, test_abs_avg=25.80427360534668
production_forward2 grad[45] vs paper_forward: mean_abs=0.7650101780891418, max_abs=5.25, mean_rel=0.17285792529582977, max_rel=1734.630859375, norm_rel=0.025351978838443756, ref_abs_avg=30.27716827392578, test_abs_avg=30.27728843688965
production_forward2 grad[46] vs paper_forward: mean_abs=0.7120535373687744, max_abs=5.125, mean_rel=0.24444085359573364, max_rel=1749.9998779296875, norm_rel=0.02380860224366188, ref_abs_avg=29.96070671081543, test_abs_avg=29.956459045410156
production_forward2 grad[47] vs paper_forward: mean_abs=0.5886578559875488, max_abs=2.546875, mean_rel=0.14717760682106018, max_rel=23.246679306030273, norm_rel=0.02358311042189598, ref_abs_avg=24.624263763427734, test_abs_avg=24.604652404785156
production_forward2 grad[48] vs paper_forward: mean_abs=0.7325113415718079, max_abs=5.5, mean_rel=0.17088104784488678, max_rel=1201.517822265625, norm_rel=0.02513880468904972, ref_abs_avg=29.195144653320312, test_abs_avg=29.19647216796875
production_forward2 grad[49] vs paper_forward: mean_abs=0.6842225790023804, max_abs=4.375, mean_rel=0.2873728573322296, max_rel=2125.0, norm_rel=0.02373691461980343, ref_abs_avg=28.843830108642578, test_abs_avg=28.84688949584961
production_forward2 grad[50] vs paper_forward: mean_abs=0.6115074157714844, max_abs=2.375, mean_rel=0.0664658397436142, max_rel=1.7613821029663086, norm_rel=0.024499615654349327, ref_abs_avg=24.53514289855957, test_abs_avg=24.533567428588867
production_forward2 grad[51] vs paper_forward: mean_abs=0.8201488256454468, max_abs=6.5, mean_rel=0.1687799096107483, max_rel=1657.54345703125, norm_rel=0.026492612436413765, ref_abs_avg=31.080154418945312, test_abs_avg=31.078712463378906
production_forward2 grad[52] vs paper_forward: mean_abs=0.7703737020492554, max_abs=5.0, mean_rel=0.3072052001953125, max_rel=2437.5, norm_rel=0.02526942640542984, ref_abs_avg=30.605276107788086, test_abs_avg=30.600759506225586
production_forward2 grad[53] vs paper_forward: mean_abs=0.6161609292030334, max_abs=3.66015625, mean_rel=0.10138624906539917, max_rel=6.801494598388672, norm_rel=0.026097670197486877, ref_abs_avg=23.50374412536621, test_abs_avg=23.462932586669922
production_forward2 grad[54] vs paper_forward: mean_abs=0.7594183087348938, max_abs=6.0, mean_rel=0.1679559350013733, max_rel=884.669921875, norm_rel=0.026032501831650734, ref_abs_avg=29.233217239379883, test_abs_avg=29.229991912841797
production_forward2 grad[55] vs paper_forward: mean_abs=0.7054159045219421, max_abs=5.25, mean_rel=0.28544527292251587, max_rel=2562.5, norm_rel=0.02461981773376465, ref_abs_avg=28.75311851501465, test_abs_avg=28.751161575317383
production_forward2 grad[56] vs paper_forward: mean_abs=0.5619773864746094, max_abs=2.5, mean_rel=0.10072880983352661, max_rel=7.672847270965576, norm_rel=0.026037652045488358, ref_abs_avg=21.945964813232422, test_abs_avg=21.99039077758789
production_forward2 grad[57] vs paper_forward: mean_abs=0.7038753032684326, max_abs=4.75, mean_rel=0.17130951583385468, max_rel=1378.6143798828125, norm_rel=0.02559765800833702, ref_abs_avg=27.592235565185547, test_abs_avg=27.591510772705078
production_forward2 grad[58] vs paper_forward: mean_abs=0.6595813035964966, max_abs=4.5625, mean_rel=0.2553457021713257, max_rel=2468.75, norm_rel=0.024245737120509148, ref_abs_avg=27.26534080505371, test_abs_avg=27.262815475463867
production_forward2 grad[59] vs paper_forward: mean_abs=0.5421075820922852, max_abs=1.8125, mean_rel=0.2022542655467987, max_rel=48.702178955078125, norm_rel=0.024570103734731674, ref_abs_avg=21.736732482910156, test_abs_avg=21.77392578125
production_forward2 grad[60] vs paper_forward: mean_abs=0.6612259149551392, max_abs=5.0, mean_rel=0.15711356699466705, max_rel=738.0603637695312, norm_rel=0.02493809536099434, ref_abs_avg=26.554515838623047, test_abs_avg=26.556812286376953
production_forward2 grad[61] vs paper_forward: mean_abs=0.6140117645263672, max_abs=4.375, mean_rel=0.25113445520401, max_rel=1687.4998779296875, norm_rel=0.023529604077339172, ref_abs_avg=26.139297485351562, test_abs_avg=26.140775680541992
production_forward2 grad[62] vs paper_forward: mean_abs=0.47559428215026855, max_abs=1.75, mean_rel=0.1753106415271759, max_rel=38.276424407958984, norm_rel=0.02402717061340809, ref_abs_avg=20.25870704650879, test_abs_avg=20.22303009033203
production_forward2 grad[63] vs paper_forward: mean_abs=0.618538498878479, max_abs=6.0, mean_rel=0.15653321146965027, max_rel=1367.796630859375, norm_rel=0.024692371487617493, ref_abs_avg=25.1188907623291, test_abs_avg=25.118711471557617
production_forward2 grad[64] vs paper_forward: mean_abs=0.5781892538070679, max_abs=4.875, mean_rel=0.22704845666885376, max_rel=2437.5, norm_rel=0.02303115464746952, ref_abs_avg=25.103477478027344, test_abs_avg=25.108856201171875
production_forward2 grad[65] vs paper_forward: mean_abs=0.4569988250732422, max_abs=1.6875, mean_rel=0.11729355156421661, max_rel=18.14957618713379, norm_rel=0.022785436362028122, ref_abs_avg=19.82560920715332, test_abs_avg=19.83574867248535
production_forward2 grad[66] vs paper_forward: mean_abs=0.5839673280715942, max_abs=4.5, mean_rel=0.15563440322875977, max_rel=919.1430053710938, norm_rel=0.024262381717562675, ref_abs_avg=24.14640235900879, test_abs_avg=24.145456314086914
production_forward2 grad[67] vs paper_forward: mean_abs=0.5484311580657959, max_abs=3.5, mean_rel=0.22875258326530457, max_rel=1499.9998779296875, norm_rel=0.02260642498731613, ref_abs_avg=24.236614227294922, test_abs_avg=24.241485595703125
production_forward2 grad[68] vs paper_forward: mean_abs=0.43140310049057007, max_abs=1.75, mean_rel=0.11769759654998779, max_rel=17.58217430114746, norm_rel=0.02251148782670498, ref_abs_avg=19.723663330078125, test_abs_avg=19.711057662963867
production_forward2 grad[69] vs paper_forward: mean_abs=0.5612678527832031, max_abs=5.0, mean_rel=0.14658565819263458, max_rel=786.3098754882812, norm_rel=0.023694785311818123, ref_abs_avg=23.66832733154297, test_abs_avg=23.668617248535156
production_forward2 grad[70] vs paper_forward: mean_abs=0.5159127712249756, max_abs=3.84375, mean_rel=0.22488738596439362, max_rel=1749.9998779296875, norm_rel=0.02225840464234352, ref_abs_avg=23.237186431884766, test_abs_avg=23.237503051757812
production_forward2 grad[71] vs paper_forward: mean_abs=0.4223663806915283, max_abs=1.84375, mean_rel=0.11458681523799896, max_rel=15.8576078414917, norm_rel=0.02287261374294758, ref_abs_avg=18.641746520996094, test_abs_avg=18.628337860107422
production_forward2 grad[72] vs paper_forward: mean_abs=0.5347117781639099, max_abs=4.5, mean_rel=0.15446415543556213, max_rel=1339.6884765625, norm_rel=0.023472867906093597, ref_abs_avg=22.810489654541016, test_abs_avg=22.810882568359375
production_forward2 grad[73] vs paper_forward: mean_abs=0.4913299083709717, max_abs=3.75, mean_rel=0.2246241718530655, max_rel=1749.9998779296875, norm_rel=0.021423911675810814, ref_abs_avg=22.852657318115234, test_abs_avg=22.85690689086914
production_forward2 grad[74] vs paper_forward: mean_abs=0.4825611114501953, max_abs=1.875, mean_rel=0.1119968369603157, max_rel=9.028992652893066, norm_rel=0.023921964690089226, ref_abs_avg=20.434337615966797, test_abs_avg=20.483749389648438
production_forward2 grad[75] vs paper_forward: mean_abs=0.6110146045684814, max_abs=4.5, mean_rel=0.15584293007850647, max_rel=891.6052856445312, norm_rel=0.024523548781871796, ref_abs_avg=24.97972869873047, test_abs_avg=24.979564666748047
production_forward2 grad[76] vs paper_forward: mean_abs=0.5643004775047302, max_abs=4.5, mean_rel=0.20304778218269348, max_rel=1765.6248779296875, norm_rel=0.02250785566866398, ref_abs_avg=25.0920467376709, test_abs_avg=25.09661865234375
production_forward2 grad[77] vs paper_forward: mean_abs=0.4350433349609375, max_abs=1.65625, mean_rel=0.07531276345252991, max_rel=3.5976529121398926, norm_rel=0.02247963286936283, ref_abs_avg=19.377443313598633, test_abs_avg=19.384294509887695
production_forward2 grad[78] vs paper_forward: mean_abs=0.5545938014984131, max_abs=5.0, mean_rel=0.1463528871536255, max_rel=739.5451049804688, norm_rel=0.024054065346717834, ref_abs_avg=23.10856819152832, test_abs_avg=23.11048698425293
production_forward2 grad[79] vs paper_forward: mean_abs=0.5172171592712402, max_abs=4.1875, mean_rel=0.1978350281715393, max_rel=1406.2498779296875, norm_rel=0.022271135821938515, ref_abs_avg=23.236553192138672, test_abs_avg=23.237051010131836
production_forward2 grad[80] vs paper_forward: mean_abs=0.4006386995315552, max_abs=1.75, mean_rel=0.11210756748914719, max_rel=14.725831985473633, norm_rel=0.021526845172047615, ref_abs_avg=18.730911254882812, test_abs_avg=18.747377395629883
production_forward2 grad[81] vs paper_forward: mean_abs=0.5154680013656616, max_abs=4.5, mean_rel=0.14159977436065674, max_rel=800.0531616210938, norm_rel=0.023340988904237747, ref_abs_avg=22.169921875, test_abs_avg=22.17055892944336
production_forward2 grad[82] vs paper_forward: mean_abs=0.4748857617378235, max_abs=3.875, mean_rel=0.22768181562423706, max_rel=1578.1248779296875, norm_rel=0.021081149578094482, ref_abs_avg=22.55801773071289, test_abs_avg=22.555450439453125
production_forward2 grad[83] vs paper_forward: mean_abs=0.3975248336791992, max_abs=1.75, mean_rel=0.10154178738594055, max_rel=15.266194343566895, norm_rel=0.022444937378168106, ref_abs_avg=17.997631072998047, test_abs_avg=17.96437644958496
production_forward2 grad[84] vs paper_forward: mean_abs=0.48772865533828735, max_abs=4.5, mean_rel=0.14873379468917847, max_rel=763.47509765625, norm_rel=0.02270212583243847, ref_abs_avg=21.562923431396484, test_abs_avg=21.563983917236328
production_forward2 grad[85] vs paper_forward: mean_abs=0.44270059466362, max_abs=4.0, mean_rel=0.19274353981018066, max_rel=1562.4998779296875, norm_rel=0.020750796422362328, ref_abs_avg=21.369182586669922, test_abs_avg=21.371612548828125
production_forward2 grad[86] vs paper_forward: mean_abs=0.3478432595729828, max_abs=1.65625, mean_rel=0.376299649477005, max_rel=146.66224670410156, norm_rel=0.020463308319449425, ref_abs_avg=17.463703155517578, test_abs_avg=17.43661117553711
production_forward2 grad[87] vs paper_forward: mean_abs=0.4598233699798584, max_abs=8.0, mean_rel=0.13412675261497498, max_rel=964.685546875, norm_rel=0.022102022543549538, ref_abs_avg=20.942527770996094, test_abs_avg=20.943798065185547
production_forward2 grad[88] vs paper_forward: mean_abs=0.41480329632759094, max_abs=3.8125, mean_rel=0.1714402139186859, max_rel=1140.625, norm_rel=0.020585162565112114, ref_abs_avg=20.21017074584961, test_abs_avg=20.213525772094727
production_forward2 grad[89] vs paper_forward: mean_abs=0.3412151336669922, max_abs=1.625, mean_rel=0.0822543054819107, max_rel=8.311634063720703, norm_rel=0.021006591618061066, ref_abs_avg=16.982730865478516, test_abs_avg=16.962295532226562
production_forward2 grad[90] vs paper_forward: mean_abs=0.42528432607650757, max_abs=6.0, mean_rel=0.13233746588230133, max_rel=987.3955078125, norm_rel=0.021620040759444237, ref_abs_avg=19.855121612548828, test_abs_avg=19.85515785217285
production_forward2 grad[91] vs paper_forward: mean_abs=0.38515764474868774, max_abs=3.5, mean_rel=0.19401836395263672, max_rel=1437.4998779296875, norm_rel=0.019996710121631622, ref_abs_avg=19.42510414123535, test_abs_avg=19.427165985107422
production_forward2 grad[92] vs paper_forward: mean_abs=0.31716418266296387, max_abs=1.25, mean_rel=0.09143146872520447, max_rel=8.548651695251465, norm_rel=0.0211640615016222, ref_abs_avg=15.166847229003906, test_abs_avg=15.186172485351562
production_forward2 grad[93] vs paper_forward: mean_abs=0.4078369736671448, max_abs=4.546875, mean_rel=0.12919840216636658, max_rel=700.9278564453125, norm_rel=0.021281296387314796, ref_abs_avg=19.407344818115234, test_abs_avg=19.408870697021484
production_forward2 grad[94] vs paper_forward: mean_abs=0.367990106344223, max_abs=3.5, mean_rel=0.17489910125732422, max_rel=1406.2498779296875, norm_rel=0.019134679809212685, ref_abs_avg=19.299041748046875, test_abs_avg=19.295642852783203
production_forward2 grad[95] vs paper_forward: mean_abs=0.3110518455505371, max_abs=1.0, mean_rel=0.0832880288362503, max_rel=6.078653335571289, norm_rel=0.01949404925107956, ref_abs_avg=15.70404052734375, test_abs_avg=15.714378356933594
production_forward2 grad[96] vs paper_forward: mean_abs=0.3770543932914734, max_abs=4.5, mean_rel=0.12692981958389282, max_rel=933.27783203125, norm_rel=0.020688753575086594, ref_abs_avg=18.546926498413086, test_abs_avg=18.546092987060547
production_forward2 grad[97] vs paper_forward: mean_abs=0.3506653308868408, max_abs=3.5, mean_rel=0.17946857213974, max_rel=1812.4998779296875, norm_rel=0.019191430881619453, ref_abs_avg=18.506404876708984, test_abs_avg=18.507673263549805

