identity layers + randn queries
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 13.01s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_out_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16'),
finished after 5.19s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 13.65s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 16.92s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 17.37s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 8.08s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 13.50s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 1, 'torch.float32', 'torch.float32'),
finished after 2.06s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 6.64s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_2_reduce_grad_pseudo_query_kernel,
with key as (131072, 512, 'torch.float32', 'torch.float32'),
finished after 2.01s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 50.26s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 8, 'torch.float32', 'torch.float32'),
finished after 1.96s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 43.70s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 32.99s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 20.00s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None;
production_forward2 fwd+bwd:  113.442 ms
production_forward2 bwd-only: 95.827 ms
production_forward2 peak allocated: fwd=2.176 GiB, fwd+bwd=10.180 GiB
production_forward2 peak reserved:  fwd=2.193 GiB, fwd+bwd=10.193 GiB
production_forward fwd+bwd:  113.580 ms
production_forward bwd-only: 95.979 ms
production_forward peak allocated: fwd=2.176 GiB, fwd+bwd=10.180 GiB
production_forward peak reserved:  fwd=2.193 GiB, fwd+bwd=10.193 GiB

/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:321: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/select_algorithm.py:3464: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  current_size = base.storage().size()
Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 0.347135990858078, "best_triton_pos": 1, "best_triton_time": 0.3768320083618164, "best_triton_kernel": "triton_mm_10", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(131072x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.3471 ms 100.0% 
  triton_mm_10 0.3768 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_13 0.3820 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_7 0.5857 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_5 0.5878 ms 59.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_6 0.5888 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_8 0.5888 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_3 0.5908 ms 58.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_4 0.6103 ms 56.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_11 0.6175 ms 56.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.6923 seconds and 0.7618 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 0.6369280219078064, "best_triton_pos": 1, "best_triton_time": 0.7024639844894409, "best_triton_kernel": "triton_mm_24", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(262144x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 0.6369 ms 100.0% 
  triton_mm_24 0.7025 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_27 0.7158 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_20 1.1561 ms 55.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_21 1.1602 ms 54.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_19 1.1612 ms 54.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_26 1.1612 ms 54.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
  triton_mm_25 1.1643 ms 54.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_22 1.1653 ms 54.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_18 1.2032 ms 52.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.6159 seconds and 0.8600 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 6, "num_triton_choices": 0, "best_kernel": "mm", "best_time": 0.5765119791030884}
AUTOTUNE mm(512x262144, 262144x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  mm 0.5765 ms 100.0% 
  decompose_k_mm_128_split_3 2.3081 ms 25.0% k_split=128
  decompose_k_mm_256_split_4 2.3183 ms 24.9% k_split=256
  decompose_k_mm_64_split_2 2.7628 ms 20.9% k_split=64
  decompose_k_mm_16_split_0 3.6762 ms 15.7% k_split=16
  decompose_k_mm_32_split_1 3.6792 ms 15.7% k_split=32
SingleProcess AUTOTUNE benchmarking takes 4.4715 seconds and 0.0005 seconds precompiling for 6 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.3041279911994934, "best_triton_pos": 0}
AUTOTUNE mm(262144x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_35 0.3041 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_36 0.3062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_33 0.3072 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_38 0.3072 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_39 0.3215 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  mm 0.3502 ms 86.8% 
  triton_mm_34 0.3533 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_37 0.3553 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_40 0.3584 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_32 0.3717 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.7075 seconds and 0.0003 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 15, "num_triton_choices": 14, "best_kernel": "mm", "best_time": 1.2451839447021484, "best_triton_pos": 1, "best_triton_time": 1.3905919790267944, "best_triton_kernel": "triton_mm_55", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"}
AUTOTUNE mm(524288x512, 512x8)
strides: [512, 1], [1, 512]
dtypes: torch.float32, torch.float32
  mm 1.2452 ms 100.0% 
  triton_mm_55 1.3906 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_58 1.4121 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_46 2.2866 ms 54.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_mm_51 2.2968 ms 54.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_57 2.3030 ms 54.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8
  triton_mm_52 2.3071 ms 54.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_50 2.3091 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_56 2.3183 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_53 2.3194 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.6112 seconds and 0.0004 seconds precompiling for 15 choices
Autotune Choices Stats:
{"num_choices": 6, "num_triton_choices": 0, "best_kernel": "mm", "best_time": 0.9349120259284973}
AUTOTUNE mm(512x524288, 524288x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  mm 0.9349 ms 100.0% 
  decompose_k_mm_128_split_8 3.7652 ms 24.8% k_split=128
  decompose_k_mm_64_split_7 4.5117 ms 20.7% k_split=64
  decompose_k_mm_256_split_9 4.6039 ms 20.3% k_split=256
  decompose_k_mm_32_split_6 6.2228 ms 15.0% k_split=32
  decompose_k_mm_16_split_5 7.3329 ms 12.7% k_split=16
SingleProcess AUTOTUNE benchmarking takes 3.0381 seconds and 0.0006 seconds precompiling for 6 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_66", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.6195200085639954, "best_triton_pos": 0}
AUTOTUNE mm(524288x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_66 0.6195 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_69 0.6226 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_67 0.6246 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_70 0.6441 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  mm 0.6523 ms 95.0% 
  triton_mm_68 0.6892 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_71 0.7004 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_64 0.7045 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_73 0.7107 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_74 0.7588 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.6232 seconds and 0.0004 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_86", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.737280011177063, "best_triton_pos": 0}
AUTOTUNE mm(655360x1, 1x512)
strides: [1, 0], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_86 0.7373 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_84 0.7404 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_83 0.7424 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_87 0.7485 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_81 0.7895 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_82 0.8049 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_85 0.8212 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_88 0.8530 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  mm 0.8591 ms 85.8% 
  triton_mm_80 0.8622 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.7398 seconds and 0.7796 seconds precompiling for 18 choices
Autotune Choices Stats:
{"num_choices": 9, "num_triton_choices": 0, "best_kernel": "mm", "best_time": 0.29900801181793213}
AUTOTUNE mm(512x131072, 131072x8)
strides: [1, 512], [8, 1]
dtypes: torch.float32, torch.float32
  mm 0.2990 ms 100.0% 
  decompose_k_mm_128_split_16 0.9554 ms 31.3% k_split=128
  decompose_k_mm_256_split_17 0.9636 ms 31.0% k_split=256
  decompose_k_mm_64_split_15 1.1407 ms 26.2% k_split=64
  decompose_k_mm_32_split_14 1.5145 ms 19.7% k_split=32
  decompose_k_mm_16_split_13 1.5299 ms 19.5% k_split=16
  decompose_k_mm_8_split_12 3.0433 ms 9.8% k_split=8
  decompose_k_mm_4_split_11 6.0703 ms 4.9% k_split=4
  decompose_k_mm_2_split_10 14.6412 ms 2.0% k_split=2
SingleProcess AUTOTUNE benchmarking takes 3.1047 seconds and 0.0003 seconds precompiling for 9 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_98", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.14950400590896606, "best_triton_pos": 0}
AUTOTUNE mm(131072x8, 8x512)
strides: [8, 1], [512, 1]
dtypes: torch.float32, torch.float32
  triton_mm_98 0.1495 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_100 0.1495 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_103 0.1526 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  mm 0.1536 ms 97.3% 
  triton_mm_101 0.1536 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_104 0.1587 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_102 0.1720 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_99 0.1741 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_105 0.1772 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_mm_95 0.1802 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5519 seconds and 0.0002 seconds precompiling for 18 choices

torch_compile_phases_forward fwd+bwd:  165.939 ms
torch_compile_phases_forward bwd-only: 132.665 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.911000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.925000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.938000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.951000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.964000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.977000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:56.990000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:57.003000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:57.014000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:57.028000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:57.041000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Runtime error during autotuning: 
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] CUDA driver error: invalid argument
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] This may mean this GPU is too small for max_autotune mode.
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] 
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] . 
E0429 14:48:57.054000 588 torch/_inductor/select_algorithm.py:3727] [2/1] Ignoring this choice.
Autotune Choices Stats:
{"num_choices": 13, "num_triton_choices": 12, "best_kernel": "bmm", "best_time": 3.463167905807495, "best_triton_pos": 1, "best_triton_time": Infinity, "best_triton_kernel": "triton_bmm_110", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2"}
AUTOTUNE bmm(131072x2x1, 131072x1x512)
strides: [1, 131072, 0], [512, 0, 1]
dtypes: torch.float32, torch.float32
  bmm 3.4632 ms 100.0% 
  triton_bmm_110 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_bmm_111 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2
  triton_bmm_112 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_bmm_113 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2
  triton_bmm_114 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_bmm_115 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_116 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_bmm_117 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  triton_bmm_118 inf ms 0.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
SingleProcess AUTOTUNE benchmarking takes 0.2424 seconds and 0.0005 seconds precompiling for 13 choices
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_129", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.29900801181793213, "best_triton_pos": 0}
AUTOTUNE mm(512x1, 1x262144)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  triton_mm_129 0.2990 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_127 0.3011 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_133 0.3082 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_130 0.3113 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_132 0.3123 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_128 0.3246 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_126 0.3318 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_134 0.3328 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
  mm 0.3441 ms 86.9% 
  triton_mm_131 0.3441 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.7091 seconds and 0.6861 seconds precompiling for 18 choices
/usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:7627: UserWarning: 
Online softmax is disabled on the fly since Inductor decides to
split the reduction. Cut an issue to PyTorch if this is an
important use case and you want to speed it up with online
softmax.

  warnings.warn(
Autotune Choices Stats:
{"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.14847999811172485, "best_triton_pos": 1, "best_triton_time": 0.14847999811172485, "best_triton_kernel": "triton_mm_144", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4"}
AUTOTUNE mm(512x1, 1x131072)
strides: [1, 512], [0, 1]
dtypes: torch.float32, torch.float32
  mm 0.1485 ms 100.0% 
  triton_mm_144 0.1485 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4
  triton_mm_146 0.1495 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_143 0.1556 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4
  triton_mm_145 0.1577 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8
  triton_mm_147 0.1587 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_149 0.1587 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4
  triton_mm_150 0.1597 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4
  triton_mm_139 0.1628 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2
  triton_mm_141 0.1679 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8
SingleProcess AUTOTUNE benchmarking takes 0.5435 seconds and 1.1078 seconds precompiling for 18 choices

paper_forward fwd+bwd:  381.898 ms
paper_forward bwd-only: 301.806 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.740 GiB, fwd+bwd=32.490 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001633478095754981, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008257539011538029, max_abs=0.515625, mean_rel=0.07111905515193939, max_rel=98.14385986328125, norm_rel=0.019344216212630272, ref_abs_avg=0.4628099501132965, test_abs_avg=0.4628225862979889
production_forward grad[1] vs paper_forward: mean_abs=7.205449104309082, max_abs=64.0, mean_rel=0.1872253119945526, max_rel=544.744873046875, norm_rel=0.019955100491642952, ref_abs_avg=319.65576171875, test_abs_avg=319.8017883300781
production_forward grad[2] vs paper_forward: mean_abs=1.2501263618469238, max_abs=5.0, mean_rel=0.14010962843894958, max_rel=17.962556838989258, norm_rel=0.02306847833096981, ref_abs_avg=54.56006622314453, test_abs_avg=54.65341567993164
production_forward grad[3] vs paper_forward: mean_abs=1.5758439302444458, max_abs=10.0, mean_rel=0.16625526547431946, max_rel=1835.102783203125, norm_rel=0.024142036214470863, ref_abs_avg=65.75184631347656, test_abs_avg=65.758056640625
production_forward grad[4] vs paper_forward: mean_abs=1.457839012145996, max_abs=9.0, mean_rel=0.41262924671173096, max_rel=3515.624755859375, norm_rel=0.022567598149180412, ref_abs_avg=65.00297546386719, test_abs_avg=65.00471496582031
production_forward grad[5] vs paper_forward: mean_abs=1.0926523208618164, max_abs=4.5, mean_rel=0.07541301846504211, max_rel=3.1661598682403564, norm_rel=0.023604294285178185, ref_abs_avg=46.38801574707031, test_abs_avg=46.323875427246094
production_forward grad[6] vs paper_forward: mean_abs=1.3886687755584717, max_abs=11.0, mean_rel=0.1592673361301422, max_rel=1329.184814453125, norm_rel=0.023985208943486214, ref_abs_avg=58.367591857910156, test_abs_avg=58.368621826171875
production_forward grad[7] vs paper_forward: mean_abs=1.2730573415756226, max_abs=8.0, mean_rel=0.3703366816043854, max_rel=4187.5, norm_rel=0.02231862209737301, ref_abs_avg=57.472023010253906, test_abs_avg=57.47222137451172
production_forward grad[8] vs paper_forward: mean_abs=1.0403048992156982, max_abs=3.5, mean_rel=0.14239874482154846, max_rel=30.63620948791504, norm_rel=0.024217864498496056, ref_abs_avg=43.11005783081055, test_abs_avg=43.052635192871094
production_forward grad[9] vs paper_forward: mean_abs=1.270287036895752, max_abs=9.0, mean_rel=0.1637004017829895, max_rel=2919.664794921875, norm_rel=0.023756330832839012, ref_abs_avg=53.86910629272461, test_abs_avg=53.86975860595703
production_forward grad[10] vs paper_forward: mean_abs=1.1647045612335205, max_abs=6.75, mean_rel=0.4613497257232666, max_rel=4781.25, norm_rel=0.022197721526026726, ref_abs_avg=52.80781173706055, test_abs_avg=52.80363464355469
production_forward grad[11] vs paper_forward: mean_abs=0.9172935485839844, max_abs=3.25, mean_rel=0.0887511670589447, max_rel=4.6226911544799805, norm_rel=0.020775172859430313, ref_abs_avg=44.6590461730957, test_abs_avg=44.663719177246094
production_forward grad[12] vs paper_forward: mean_abs=1.1649024486541748, max_abs=8.0, mean_rel=0.1563510298728943, max_rel=1522.494384765625, norm_rel=0.023562293499708176, ref_abs_avg=49.7492790222168, test_abs_avg=49.7506103515625
production_forward grad[13] vs paper_forward: mean_abs=1.0668437480926514, max_abs=6.25, mean_rel=0.37649062275886536, max_rel=2874.999755859375, norm_rel=0.02188897505402565, ref_abs_avg=48.99083709716797, test_abs_avg=48.98691940307617
production_forward grad[14] vs paper_forward: mean_abs=0.848663330078125, max_abs=3.25, mean_rel=0.10763120651245117, max_rel=17.43576431274414, norm_rel=0.021005135029554367, ref_abs_avg=39.881309509277344, test_abs_avg=39.87255859375
production_forward grad[15] vs paper_forward: mean_abs=1.094990611076355, max_abs=7.0, mean_rel=0.15183836221694946, max_rel=1133.0506591796875, norm_rel=0.023444026708602905, ref_abs_avg=47.02465057373047, test_abs_avg=47.02617263793945
production_forward grad[16] vs paper_forward: mean_abs=1.0013651847839355, max_abs=6.75, mean_rel=0.2980579733848572, max_rel=2687.499755859375, norm_rel=0.021821144968271255, ref_abs_avg=46.133567810058594, test_abs_avg=46.14125442504883
production_forward grad[17] vs paper_forward: mean_abs=0.7903375625610352, max_abs=3.25, mean_rel=0.05996483936905861, max_rel=1.883025050163269, norm_rel=0.020611261948943138, ref_abs_avg=39.607421875, test_abs_avg=39.604637145996094
production_forward grad[18] vs paper_forward: mean_abs=1.0209214687347412, max_abs=8.0, mean_rel=0.1617361456155777, max_rel=1372.2750244140625, norm_rel=0.023234162479639053, ref_abs_avg=44.20476150512695, test_abs_avg=44.20779037475586
production_forward grad[19] vs paper_forward: mean_abs=0.9398967623710632, max_abs=5.78125, mean_rel=0.2744932174682617, max_rel=3499.999755859375, norm_rel=0.021568888798356056, ref_abs_avg=43.7913703918457, test_abs_avg=43.795169830322266
production_forward grad[20] vs paper_forward: mean_abs=0.7523727416992188, max_abs=3.0, mean_rel=0.09606119245290756, max_rel=9.970006942749023, norm_rel=0.0224323607981205, ref_abs_avg=33.33338928222656, test_abs_avg=33.290008544921875
production_forward grad[21] vs paper_forward: mean_abs=0.974308431148529, max_abs=7.5, mean_rel=0.16259215772151947, max_rel=1903.1201171875, norm_rel=0.023200633004307747, ref_abs_avg=42.23298263549805, test_abs_avg=42.23289489746094
production_forward grad[22] vs paper_forward: mean_abs=0.8944091796875, max_abs=5.78125, mean_rel=0.29242298007011414, max_rel=3218.749755859375, norm_rel=0.02158360555768013, ref_abs_avg=41.61048889160156, test_abs_avg=41.61630630493164
production_forward grad[23] vs paper_forward: mean_abs=0.7243638038635254, max_abs=2.96875, mean_rel=0.13465692102909088, max_rel=11.623663902282715, norm_rel=0.021558180451393127, ref_abs_avg=34.23834228515625, test_abs_avg=34.232994079589844
production_forward grad[24] vs paper_forward: mean_abs=0.9288046360015869, max_abs=7.0, mean_rel=0.14933937788009644, max_rel=1043.4564208984375, norm_rel=0.02302238903939724, ref_abs_avg=40.58786392211914, test_abs_avg=40.58925247192383
production_forward grad[25] vs paper_forward: mean_abs=0.8466500043869019, max_abs=5.21875, mean_rel=0.2855740785598755, max_rel=2593.749755859375, norm_rel=0.021292950958013535, ref_abs_avg=39.93577575683594, test_abs_avg=39.9398193359375
production_forward grad[26] vs paper_forward: mean_abs=0.8299088478088379, max_abs=4.28125, mean_rel=0.13300447165966034, max_rel=25.22145652770996, norm_rel=0.024393700063228607, ref_abs_avg=35.917110443115234, test_abs_avg=35.963890075683594
production_forward grad[27] vs paper_forward: mean_abs=1.07735276222229, max_abs=7.0, mean_rel=0.17432911694049835, max_rel=2506.41845703125, norm_rel=0.024836309254169464, ref_abs_avg=43.61906433105469, test_abs_avg=43.61848449707031
production_forward grad[28] vs paper_forward: mean_abs=0.9966209530830383, max_abs=6.0625, mean_rel=0.33297285437583923, max_rel=4750.0, norm_rel=0.023141777142882347, ref_abs_avg=43.233497619628906, test_abs_avg=43.244937896728516
production_forward grad[29] vs paper_forward: mean_abs=0.7989349365234375, max_abs=3.25, mean_rel=0.13423548638820648, max_rel=8.45030403137207, norm_rel=0.02400190569460392, ref_abs_avg=33.34352493286133, test_abs_avg=33.42670822143555
production_forward grad[30] vs paper_forward: mean_abs=0.9897015690803528, max_abs=7.5, mean_rel=0.16684892773628235, max_rel=910.2296142578125, norm_rel=0.025188898667693138, ref_abs_avg=39.49658203125, test_abs_avg=39.49785232543945
production_forward grad[31] vs paper_forward: mean_abs=0.9220681190490723, max_abs=6.625, mean_rel=0.3293399512767792, max_rel=2749.999755859375, norm_rel=0.023667411878705025, ref_abs_avg=39.143959045410156, test_abs_avg=39.14204025268555
production_forward grad[32] vs paper_forward: mean_abs=0.7363872528076172, max_abs=3.0, mean_rel=0.07279999554157257, max_rel=4.175570964813232, norm_rel=0.023979634046554565, ref_abs_avg=31.040504455566406, test_abs_avg=30.954687118530273
production_forward grad[33] vs paper_forward: mean_abs=0.9198131561279297, max_abs=5.75, mean_rel=0.16121244430541992, max_rel=1032.059326171875, norm_rel=0.0250199344009161, ref_abs_avg=36.96323776245117, test_abs_avg=36.963958740234375
production_forward grad[34] vs paper_forward: mean_abs=0.8514114618301392, max_abs=5.0, mean_rel=0.2690410912036896, max_rel=2187.5, norm_rel=0.023482685908675194, ref_abs_avg=36.348297119140625, test_abs_avg=36.35802459716797
production_forward grad[35] vs paper_forward: mean_abs=0.6947860717773438, max_abs=2.75, mean_rel=0.05630755424499512, max_rel=2.228774070739746, norm_rel=0.02346654236316681, ref_abs_avg=30.47289276123047, test_abs_avg=30.408447265625
production_forward grad[36] vs paper_forward: mean_abs=0.8642527461051941, max_abs=6.0, mean_rel=0.16477379202842712, max_rel=1431.9102783203125, norm_rel=0.024927562102675438, ref_abs_avg=34.82465362548828, test_abs_avg=34.82550048828125
production_forward grad[37] vs paper_forward: mean_abs=0.8001699447631836, max_abs=4.875, mean_rel=0.28861457109451294, max_rel=2500.0, norm_rel=0.02334744483232498, ref_abs_avg=34.327762603759766, test_abs_avg=34.330406188964844
production_forward grad[38] vs paper_forward: mean_abs=0.6534538269042969, max_abs=2.625, mean_rel=0.10874753445386887, max_rel=17.48021125793457, norm_rel=0.02278260886669159, ref_abs_avg=28.42387580871582, test_abs_avg=28.45887565612793
production_forward grad[39] vs paper_forward: mean_abs=0.8176848888397217, max_abs=5.5, mean_rel=0.15434539318084717, max_rel=960.3948974609375, norm_rel=0.024433588609099388, ref_abs_avg=33.57039260864258, test_abs_avg=33.571815490722656
production_forward grad[40] vs paper_forward: mean_abs=0.7580397725105286, max_abs=5.0, mean_rel=0.31103256344795227, max_rel=2749.999755859375, norm_rel=0.02306702546775341, ref_abs_avg=32.9521484375, test_abs_avg=32.95039367675781
production_forward grad[41] vs paper_forward: mean_abs=0.5957679748535156, max_abs=2.359375, mean_rel=0.10184772312641144, max_rel=7.174206733703613, norm_rel=0.02174048125743866, ref_abs_avg=27.941038131713867, test_abs_avg=27.903200149536133
production_forward grad[42] vs paper_forward: mean_abs=0.7780894637107849, max_abs=6.0, mean_rel=0.1556810438632965, max_rel=1125.3284912109375, norm_rel=0.024189114570617676, ref_abs_avg=32.22244644165039, test_abs_avg=32.222801208496094
production_forward grad[43] vs paper_forward: mean_abs=0.7217513918876648, max_abs=4.5, mean_rel=0.2728955149650574, max_rel=2125.0, norm_rel=0.023003630340099335, ref_abs_avg=31.433212280273438, test_abs_avg=31.42707061767578
production_forward grad[44] vs paper_forward: mean_abs=0.5361502170562744, max_abs=2.25, mean_rel=0.4353620409965515, max_rel=170.19097900390625, norm_rel=0.020043810829520226, ref_abs_avg=26.964502334594727, test_abs_avg=26.959003448486328
production_forward grad[45] vs paper_forward: mean_abs=0.735532283782959, max_abs=5.0, mean_rel=0.16005302965641022, max_rel=942.637939453125, norm_rel=0.024224815890192986, ref_abs_avg=30.483619689941406, test_abs_avg=30.48486328125
production_forward grad[46] vs paper_forward: mean_abs=0.6788387298583984, max_abs=4.625, mean_rel=0.27982914447784424, max_rel=3062.499755859375, norm_rel=0.022588223218917847, ref_abs_avg=30.13311767578125, test_abs_avg=30.131362915039062
production_forward grad[47] vs paper_forward: mean_abs=0.5372967720031738, max_abs=2.25, mean_rel=0.15617360174655914, max_rel=12.174405097961426, norm_rel=0.021525949239730835, ref_abs_avg=25.677217483520508, test_abs_avg=25.734899520874023
production_forward grad[48] vs paper_forward: mean_abs=0.7150889039039612, max_abs=6.0, mean_rel=0.1527353674173355, max_rel=1376.7135009765625, norm_rel=0.02390715666115284, ref_abs_avg=29.997859954833984, test_abs_avg=29.996402740478516
production_forward grad[49] vs paper_forward: mean_abs=0.6594635248184204, max_abs=5.25, mean_rel=0.2408795803785324, max_rel=2250.0, norm_rel=0.02263766899704933, ref_abs_avg=29.202930450439453, test_abs_avg=29.20625114440918
production_forward grad[50] vs paper_forward: mean_abs=0.6336860656738281, max_abs=2.5625, mean_rel=0.0932953804731369, max_rel=6.062620639801025, norm_rel=0.02529011107981205, ref_abs_avg=24.933258056640625, test_abs_avg=24.88225555419922
production_forward grad[51] vs paper_forward: mean_abs=0.7809044122695923, max_abs=5.5, mean_rel=0.1683441400527954, max_rel=1147.653076171875, norm_rel=0.025707386434078217, ref_abs_avg=30.463340759277344, test_abs_avg=30.462684631347656
production_forward grad[52] vs paper_forward: mean_abs=0.7300242185592651, max_abs=4.5, mean_rel=0.2669707238674164, max_rel=2265.625, norm_rel=0.024059684947133064, ref_abs_avg=30.374610900878906, test_abs_avg=30.377796173095703
production_forward grad[53] vs paper_forward: mean_abs=0.5752172470092773, max_abs=1.9375, mean_rel=0.09623073041439056, max_rel=3.0807957649230957, norm_rel=0.02446109987795353, ref_abs_avg=22.766483306884766, test_abs_avg=22.77743911743164
production_forward grad[54] vs paper_forward: mean_abs=0.7191876173019409, max_abs=5.0, mean_rel=0.16423672437667847, max_rel=1426.380859375, norm_rel=0.025156868621706963, ref_abs_avg=28.662372589111328, test_abs_avg=28.664464950561523
production_forward grad[55] vs paper_forward: mean_abs=0.6673415899276733, max_abs=4.5, mean_rel=0.26541346311569214, max_rel=1937.4998779296875, norm_rel=0.02361196093261242, ref_abs_avg=28.336816787719727, test_abs_avg=28.338573455810547
production_forward grad[56] vs paper_forward: mean_abs=0.5370489358901978, max_abs=2.0625, mean_rel=0.11898427456617355, max_rel=19.220109939575195, norm_rel=0.02452000603079796, ref_abs_avg=21.98394012451172, test_abs_avg=22.030197143554688
production_forward grad[57] vs paper_forward: mean_abs=0.6714156866073608, max_abs=5.5, mean_rel=0.16040337085723877, max_rel=1670.487548828125, norm_rel=0.02480393834412098, ref_abs_avg=27.122100830078125, test_abs_avg=27.1254940032959
production_forward grad[58] vs paper_forward: mean_abs=0.6229841709136963, max_abs=4.5, mean_rel=0.25052937865257263, max_rel=2218.75, norm_rel=0.023048115894198418, ref_abs_avg=27.054224014282227, test_abs_avg=27.052383422851562
production_forward grad[59] vs paper_forward: mean_abs=0.5045301914215088, max_abs=1.75, mean_rel=0.12164834886789322, max_rel=21.535314559936523, norm_rel=0.02365521341562271, ref_abs_avg=21.136619567871094, test_abs_avg=21.12721824645996
production_forward grad[60] vs paper_forward: mean_abs=0.6301783323287964, max_abs=5.0, mean_rel=0.15676508843898773, max_rel=1236.67333984375, norm_rel=0.024477645754814148, ref_abs_avg=25.80059814453125, test_abs_avg=25.79873275756836
production_forward grad[61] vs paper_forward: mean_abs=0.5849899053573608, max_abs=4.25, mean_rel=0.20337817072868347, max_rel=1406.2498779296875, norm_rel=0.022953180596232414, ref_abs_avg=25.514707565307617, test_abs_avg=25.51446533203125
production_forward grad[62] vs paper_forward: mean_abs=0.4488711357116699, max_abs=1.75, mean_rel=0.09518755972385406, max_rel=9.740891456604004, norm_rel=0.021343305706977844, ref_abs_avg=20.821928024291992, test_abs_avg=20.832929611206055
production_forward grad[63] vs paper_forward: mean_abs=0.5986669063568115, max_abs=6.0, mean_rel=0.15685683488845825, max_rel=727.3963012695312, norm_rel=0.024083511903882027, ref_abs_avg=24.869911193847656, test_abs_avg=24.86954689025879
production_forward grad[64] vs paper_forward: mean_abs=0.5499686598777771, max_abs=4.25, mean_rel=0.2563495337963104, max_rel=2375.0, norm_rel=0.022234514355659485, ref_abs_avg=24.68301010131836, test_abs_avg=24.68041229248047
production_forward grad[65] vs paper_forward: mean_abs=0.45436331629753113, max_abs=1.82421875, mean_rel=0.7424437999725342, max_rel=242.2499542236328, norm_rel=0.023217305541038513, ref_abs_avg=19.881362915039062, test_abs_avg=19.852210998535156
production_forward grad[66] vs paper_forward: mean_abs=0.5684493780136108, max_abs=6.0, mean_rel=0.1443517655134201, max_rel=699.6616821289062, norm_rel=0.023570267483592033, ref_abs_avg=24.140329360961914, test_abs_avg=24.140625
production_forward grad[67] vs paper_forward: mean_abs=0.5319273471832275, max_abs=4.5, mean_rel=0.2293834239244461, max_rel=1718.7498779296875, norm_rel=0.022179771214723587, ref_abs_avg=24.008941650390625, test_abs_avg=24.01681137084961
production_forward grad[68] vs paper_forward: mean_abs=0.4213443398475647, max_abs=1.5, mean_rel=0.6738296747207642, max_rel=299.0049743652344, norm_rel=0.021950209513306618, ref_abs_avg=19.205078125, test_abs_avg=19.23273468017578
production_forward grad[69] vs paper_forward: mean_abs=0.5334911346435547, max_abs=5.0, mean_rel=0.1468566358089447, max_rel=706.8729858398438, norm_rel=0.023244095966219902, ref_abs_avg=22.980670928955078, test_abs_avg=22.979541778564453
production_forward grad[70] vs paper_forward: mean_abs=0.4955715835094452, max_abs=3.640625, mean_rel=0.18964093923568726, max_rel=1624.9998779296875, norm_rel=0.021644996479153633, ref_abs_avg=22.837215423583984, test_abs_avg=22.8327579498291
production_forward grad[71] vs paper_forward: mean_abs=0.4183492660522461, max_abs=1.875, mean_rel=0.13891184329986572, max_rel=39.51945495605469, norm_rel=0.021020615473389626, ref_abs_avg=20.36181640625, test_abs_avg=20.347618103027344
production_forward grad[72] vs paper_forward: mean_abs=0.5166003108024597, max_abs=4.5, mean_rel=0.13722831010818481, max_rel=848.3492431640625, norm_rel=0.022950174286961555, ref_abs_avg=22.488754272460938, test_abs_avg=22.489418029785156
production_forward grad[73] vs paper_forward: mean_abs=0.47144103050231934, max_abs=4.0, mean_rel=0.20135556161403656, max_rel=1437.4998779296875, norm_rel=0.021378764882683754, ref_abs_avg=22.027130126953125, test_abs_avg=22.02974510192871
production_forward grad[74] vs paper_forward: mean_abs=0.43825578689575195, max_abs=1.625, mean_rel=0.13957394659519196, max_rel=38.692020416259766, norm_rel=0.023610848933458328, ref_abs_avg=19.002262115478516, test_abs_avg=19.005779266357422
production_forward grad[75] vs paper_forward: mean_abs=0.572036862373352, max_abs=4.5, mean_rel=0.1626746505498886, max_rel=1274.3756103515625, norm_rel=0.024523435160517693, ref_abs_avg=23.38382339477539, test_abs_avg=23.384559631347656
production_forward grad[76] vs paper_forward: mean_abs=0.5307968854904175, max_abs=4.1875, mean_rel=0.2268759310245514, max_rel=1843.7498779296875, norm_rel=0.02289128489792347, ref_abs_avg=23.287994384765625, test_abs_avg=23.288942337036133
production_forward grad[77] vs paper_forward: mean_abs=0.40941429138183594, max_abs=1.8125, mean_rel=0.0946609377861023, max_rel=6.803074359893799, norm_rel=0.021878628060221672, ref_abs_avg=18.47771453857422, test_abs_avg=18.5051326751709
production_forward grad[78] vs paper_forward: mean_abs=0.5260473489761353, max_abs=4.25, mean_rel=0.14384537935256958, max_rel=480.9143371582031, norm_rel=0.023789795115590096, ref_abs_avg=22.137359619140625, test_abs_avg=22.136722564697266
production_forward grad[79] vs paper_forward: mean_abs=0.48889774084091187, max_abs=3.75, mean_rel=0.2113284170627594, max_rel=1250.0, norm_rel=0.02263449691236019, ref_abs_avg=21.693626403808594, test_abs_avg=21.70268440246582
production_forward grad[80] vs paper_forward: mean_abs=0.3812694549560547, max_abs=1.625, mean_rel=0.06325623393058777, max_rel=3.1278200149536133, norm_rel=0.02271849662065506, ref_abs_avg=17.23468589782715, test_abs_avg=17.243099212646484
production_forward grad[81] vs paper_forward: mean_abs=0.4837799072265625, max_abs=5.0, mean_rel=0.1484089195728302, max_rel=793.2633056640625, norm_rel=0.023277509957551956, ref_abs_avg=20.85049057006836, test_abs_avg=20.851165771484375
production_forward grad[82] vs paper_forward: mean_abs=0.4518340229988098, max_abs=3.8125, mean_rel=0.2138252854347229, max_rel=1468.7498779296875, norm_rel=0.02169385552406311, ref_abs_avg=20.76464080810547, test_abs_avg=20.76896858215332
production_forward grad[83] vs paper_forward: mean_abs=0.3278770446777344, max_abs=1.25, mean_rel=0.13410848379135132, max_rel=16.05437469482422, norm_rel=0.020548274740576744, ref_abs_avg=15.80752944946289, test_abs_avg=15.805610656738281
production_forward grad[84] vs paper_forward: mean_abs=0.45851829648017883, max_abs=5.0, mean_rel=0.13937655091285706, max_rel=768.4234619140625, norm_rel=0.022526483982801437, ref_abs_avg=20.423803329467773, test_abs_avg=20.42552947998047
production_forward grad[85] vs paper_forward: mean_abs=0.4145107865333557, max_abs=4.0, mean_rel=0.18856123089790344, max_rel=1593.7498779296875, norm_rel=0.020460568368434906, ref_abs_avg=20.286243438720703, test_abs_avg=20.282140731811523
production_forward grad[86] vs paper_forward: mean_abs=0.3381919860839844, max_abs=1.5, mean_rel=0.0897628664970398, max_rel=5.936955451965332, norm_rel=0.020096246153116226, ref_abs_avg=16.7191104888916, test_abs_avg=16.704627990722656
production_forward grad[87] vs paper_forward: mean_abs=0.4338127374649048, max_abs=4.0, mean_rel=0.14242394268512726, max_rel=1353.918212890625, norm_rel=0.022493531927466393, ref_abs_avg=19.392419815063477, test_abs_avg=19.391273498535156
production_forward grad[88] vs paper_forward: mean_abs=0.40276259183883667, max_abs=4.0, mean_rel=0.19434663653373718, max_rel=1226.5625, norm_rel=0.02129313535988331, ref_abs_avg=19.07407569885254, test_abs_avg=19.070283889770508
production_forward grad[89] vs paper_forward: mean_abs=0.3222651481628418, max_abs=1.375, mean_rel=0.09383776038885117, max_rel=7.681109428405762, norm_rel=0.020886491984128952, ref_abs_avg=15.768205642700195, test_abs_avg=15.770414352416992
production_forward grad[90] vs paper_forward: mean_abs=0.40581464767456055, max_abs=4.0, mean_rel=0.13219013810157776, max_rel=389.99542236328125, norm_rel=0.02169518545269966, ref_abs_avg=18.862621307373047, test_abs_avg=18.862869262695312
production_forward grad[91] vs paper_forward: mean_abs=0.37594369053840637, max_abs=4.0, mean_rel=0.16684474050998688, max_rel=874.9999389648438, norm_rel=0.020584939047694206, ref_abs_avg=18.42877197265625, test_abs_avg=18.434673309326172
production_forward grad[92] vs paper_forward: mean_abs=0.30406635999679565, max_abs=1.15625, mean_rel=0.15132196247577667, max_rel=44.79315185546875, norm_rel=0.019646495580673218, ref_abs_avg=16.013463973999023, test_abs_avg=15.994158744812012
production_forward grad[93] vs paper_forward: mean_abs=0.3900204598903656, max_abs=5.0, mean_rel=0.1270671784877777, max_rel=452.8345947265625, norm_rel=0.02140078879892826, ref_abs_avg=18.43975067138672, test_abs_avg=18.437969207763672
production_forward grad[94] vs paper_forward: mean_abs=0.34810084104537964, max_abs=3.75, mean_rel=0.15695726871490479, max_rel=1062.5, norm_rel=0.0192633718252182, ref_abs_avg=18.2755126953125, test_abs_avg=18.266498565673828
production_forward grad[95] vs paper_forward: mean_abs=0.28409624099731445, max_abs=1.125, mean_rel=0.07845275849103928, max_rel=6.754331588745117, norm_rel=0.01888086274266243, ref_abs_avg=15.243175506591797, test_abs_avg=15.232295989990234
production_forward grad[96] vs paper_forward: mean_abs=0.3593784272670746, max_abs=4.0, mean_rel=0.1267668902873993, max_rel=628.427490234375, norm_rel=0.021102655678987503, ref_abs_avg=17.31606674194336, test_abs_avg=17.31464385986328
production_forward grad[97] vs paper_forward: mean_abs=0.3338397443294525, max_abs=4.0, mean_rel=0.15508130192756653, max_rel=1312.4998779296875, norm_rel=0.019444439560174942, ref_abs_avg=17.558147430419922, test_abs_avg=17.545896530151367
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016372859245166183, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008607758209109306, max_abs=0.453125, mean_rel=0.07379233092069626, max_rel=92.47954559326172, norm_rel=0.02005123160779476, ref_abs_avg=0.4628099501132965, test_abs_avg=0.4628117084503174
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.373456001281738, max_abs=52.0, mean_rel=0.3034015893936157, max_rel=2252.088623046875, norm_rel=0.020415406674146652, ref_abs_avg=319.65576171875, test_abs_avg=319.8633728027344
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.28425931930542, max_abs=5.0, mean_rel=0.19103598594665527, max_rel=29.76885223388672, norm_rel=0.023559987545013428, ref_abs_avg=54.56006622314453, test_abs_avg=54.586204528808594
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6277018785476685, max_abs=11.0, mean_rel=0.1703043133020401, max_rel=2871.1357421875, norm_rel=0.024897435680031776, ref_abs_avg=65.75184631347656, test_abs_avg=65.755126953125
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5072675943374634, max_abs=10.625, mean_rel=0.45124804973602295, max_rel=3484.374755859375, norm_rel=0.023343097418546677, ref_abs_avg=65.00297546386719, test_abs_avg=65.0086898803711
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1260347366333008, max_abs=4.0, mean_rel=0.0773206278681755, max_rel=3.655381917953491, norm_rel=0.02452310360968113, ref_abs_avg=46.38801574707031, test_abs_avg=46.400875091552734
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.434207558631897, max_abs=12.0, mean_rel=0.17256897687911987, max_rel=1989.3663330078125, norm_rel=0.024733491241931915, ref_abs_avg=58.367591857910156, test_abs_avg=58.369873046875
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3181331157684326, max_abs=8.0, mean_rel=0.3937772512435913, max_rel=4500.0, norm_rel=0.023107029497623444, ref_abs_avg=57.472023010253906, test_abs_avg=57.476131439208984
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0571261644363403, max_abs=4.875, mean_rel=0.16310235857963562, max_rel=20.810447692871094, norm_rel=0.025068867951631546, ref_abs_avg=43.11005783081055, test_abs_avg=43.06651306152344
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.307155728340149, max_abs=9.0, mean_rel=0.16839249432086945, max_rel=2211.853515625, norm_rel=0.024432368576526642, ref_abs_avg=53.86910629272461, test_abs_avg=53.86979293823242
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.197479486465454, max_abs=7.5, mean_rel=0.47001978754997253, max_rel=4343.75, norm_rel=0.022804562002420425, ref_abs_avg=52.80781173706055, test_abs_avg=52.80760192871094
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9018335342407227, max_abs=3.25, mean_rel=0.0871262401342392, max_rel=4.135599136352539, norm_rel=0.020348504185676575, ref_abs_avg=44.6590461730957, test_abs_avg=44.64284896850586
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1967687606811523, max_abs=9.0, mean_rel=0.15837274491786957, max_rel=1134.8404541015625, norm_rel=0.024209734052419662, ref_abs_avg=49.7492790222168, test_abs_avg=49.74772644042969
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1093238592147827, max_abs=7.0, mean_rel=0.3913179039955139, max_rel=3874.999755859375, norm_rel=0.022734757512807846, ref_abs_avg=48.99083709716797, test_abs_avg=48.98851776123047
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8596181869506836, max_abs=3.0, mean_rel=0.07281902432441711, max_rel=3.3038268089294434, norm_rel=0.021475305780768394, ref_abs_avg=39.881309509277344, test_abs_avg=39.89542770385742
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1225435733795166, max_abs=7.0, mean_rel=0.15324008464813232, max_rel=1377.324462890625, norm_rel=0.024019110947847366, ref_abs_avg=47.02465057373047, test_abs_avg=47.02512741088867
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0292744636535645, max_abs=6.75, mean_rel=0.2955978214740753, max_rel=3187.499755859375, norm_rel=0.022416798397898674, ref_abs_avg=46.133567810058594, test_abs_avg=46.14073181152344
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8530244827270508, max_abs=3.375, mean_rel=0.06791077554225922, max_rel=2.3948540687561035, norm_rel=0.021547464653849602, ref_abs_avg=39.607421875, test_abs_avg=39.60699462890625
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.045320987701416, max_abs=7.0, mean_rel=0.1610395610332489, max_rel=1131.390625, norm_rel=0.023781588301062584, ref_abs_avg=44.20476150512695, test_abs_avg=44.20806121826172
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9678810834884644, max_abs=6.0, mean_rel=0.28941771388053894, max_rel=3874.999755859375, norm_rel=0.022221624851226807, ref_abs_avg=43.7913703918457, test_abs_avg=43.792701721191406
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7764480113983154, max_abs=3.0, mean_rel=0.07846899330615997, max_rel=4.2494893074035645, norm_rel=0.023349134251475334, ref_abs_avg=33.33338928222656, test_abs_avg=33.26606750488281
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9965102076530457, max_abs=6.125, mean_rel=0.16498872637748718, max_rel=2245.43017578125, norm_rel=0.023708002641797066, ref_abs_avg=42.23298263549805, test_abs_avg=42.23133087158203
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9179816842079163, max_abs=6.0, mean_rel=0.2919922471046448, max_rel=2687.499755859375, norm_rel=0.022160785272717476, ref_abs_avg=41.61048889160156, test_abs_avg=41.61271286010742
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7680940628051758, max_abs=3.25, mean_rel=0.1683121621608734, max_rel=25.166364669799805, norm_rel=0.022960668429732323, ref_abs_avg=34.23834228515625, test_abs_avg=34.21211624145508
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9499266147613525, max_abs=6.0, mean_rel=0.15482786297798157, max_rel=1150.7615966796875, norm_rel=0.023535525426268578, ref_abs_avg=40.58786392211914, test_abs_avg=40.58545684814453
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8688327074050903, max_abs=5.125, mean_rel=0.2879944443702698, max_rel=3437.499755859375, norm_rel=0.021849792450666428, ref_abs_avg=39.93577575683594, test_abs_avg=39.941184997558594
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8760757446289062, max_abs=3.84375, mean_rel=0.13643738627433777, max_rel=10.457676887512207, norm_rel=0.025044113397598267, ref_abs_avg=35.917110443115234, test_abs_avg=35.966957092285156
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1036852598190308, max_abs=7.0, mean_rel=0.17665109038352966, max_rel=1935.4873046875, norm_rel=0.025445055216550827, ref_abs_avg=43.61906433105469, test_abs_avg=43.61879348754883
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0222923755645752, max_abs=6.5, mean_rel=0.3482453227043152, max_rel=4125.0, norm_rel=0.023741835728287697, ref_abs_avg=43.233497619628906, test_abs_avg=43.24449157714844
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8115124702453613, max_abs=3.5, mean_rel=0.1327705681324005, max_rel=8.863143920898438, norm_rel=0.024416424334049225, ref_abs_avg=33.34352493286133, test_abs_avg=33.382667541503906
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.012035846710205, max_abs=7.0, mean_rel=0.16742441058158875, max_rel=686.2757568359375, norm_rel=0.02576979249715805, ref_abs_avg=39.49658203125, test_abs_avg=39.496673583984375
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9402819871902466, max_abs=6.3125, mean_rel=0.32867276668548584, max_rel=2874.999755859375, norm_rel=0.02413569949567318, ref_abs_avg=39.143959045410156, test_abs_avg=39.14307403564453
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7599401473999023, max_abs=2.875, mean_rel=0.07513608038425446, max_rel=3.3601396083831787, norm_rel=0.02467956393957138, ref_abs_avg=31.040504455566406, test_abs_avg=30.971187591552734
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9393237829208374, max_abs=6.5, mean_rel=0.16372662782669067, max_rel=1113.513671875, norm_rel=0.02553405985236168, ref_abs_avg=36.96323776245117, test_abs_avg=36.962799072265625
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8710806369781494, max_abs=5.0625, mean_rel=0.27549177408218384, max_rel=2437.5, norm_rel=0.02400672435760498, ref_abs_avg=36.348297119140625, test_abs_avg=36.35612487792969
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7101974487304688, max_abs=3.625, mean_rel=0.06400883942842484, max_rel=1.9283437728881836, norm_rel=0.024127621203660965, ref_abs_avg=30.47289276123047, test_abs_avg=30.432525634765625
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8797214031219482, max_abs=6.0, mean_rel=0.16406890749931335, max_rel=1381.3623046875, norm_rel=0.025358112528920174, ref_abs_avg=34.82465362548828, test_abs_avg=34.82557678222656
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8173043727874756, max_abs=5.0, mean_rel=0.28586238622665405, max_rel=2328.125, norm_rel=0.02385769598186016, ref_abs_avg=34.327762603759766, test_abs_avg=34.330238342285156
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6595697402954102, max_abs=2.640625, mean_rel=0.09702770411968231, max_rel=10.471635818481445, norm_rel=0.023193607106804848, ref_abs_avg=28.42387580871582, test_abs_avg=28.444679260253906
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8311799764633179, max_abs=5.5, mean_rel=0.15335725247859955, max_rel=992.242431640625, norm_rel=0.024836070835590363, ref_abs_avg=33.57039260864258, test_abs_avg=33.57006072998047
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7739440202713013, max_abs=4.9375, mean_rel=0.3338998556137085, max_rel=2624.999755859375, norm_rel=0.02353387139737606, ref_abs_avg=32.9521484375, test_abs_avg=32.95030975341797
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.5945196151733398, max_abs=2.5, mean_rel=0.10467219352722168, max_rel=8.524645805358887, norm_rel=0.02138608880341053, ref_abs_avg=27.941038131713867, test_abs_avg=27.902111053466797
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7891237735748291, max_abs=5.5, mean_rel=0.1587260216474533, max_rel=986.1444091796875, norm_rel=0.02452809549868107, ref_abs_avg=32.22244644165039, test_abs_avg=32.2203369140625
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7376864552497864, max_abs=4.625, mean_rel=0.25584614276885986, max_rel=2140.625, norm_rel=0.023518288508057594, ref_abs_avg=31.433212280273438, test_abs_avg=31.425655364990234
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.579016923904419, max_abs=2.3125, mean_rel=0.46345436573028564, max_rel=184.6605987548828, norm_rel=0.02144419401884079, ref_abs_avg=26.964502334594727, test_abs_avg=26.982521057128906
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7461744546890259, max_abs=5.5, mean_rel=0.1625673472881317, max_rel=899.0947875976562, norm_rel=0.024577831849455833, ref_abs_avg=30.483619689941406, test_abs_avg=30.484481811523438
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.69345623254776, max_abs=4.5625, mean_rel=0.2950112223625183, max_rel=2624.999755859375, norm_rel=0.023060249164700508, ref_abs_avg=30.13311767578125, test_abs_avg=30.131118774414062
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.542881965637207, max_abs=2.375, mean_rel=0.15821251273155212, max_rel=14.007718086242676, norm_rel=0.02192109450697899, ref_abs_avg=25.677217483520508, test_abs_avg=25.731443405151367
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.723981499671936, max_abs=5.0, mean_rel=0.15263456106185913, max_rel=864.972900390625, norm_rel=0.02419014647603035, ref_abs_avg=29.997859954833984, test_abs_avg=29.99701690673828
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6701165437698364, max_abs=5.75, mean_rel=0.2524714469909668, max_rel=2437.5, norm_rel=0.023000802844762802, ref_abs_avg=29.202930450439453, test_abs_avg=29.206920623779297
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6150534152984619, max_abs=2.25, mean_rel=0.08492842316627502, max_rel=4.114330291748047, norm_rel=0.024564003571867943, ref_abs_avg=24.933258056640625, test_abs_avg=24.90380859375
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7921689748764038, max_abs=5.5, mean_rel=0.16826897859573364, max_rel=1694.6427001953125, norm_rel=0.026081031188368797, ref_abs_avg=30.463340759277344, test_abs_avg=30.461353302001953
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7408914566040039, max_abs=4.5, mean_rel=0.2599327564239502, max_rel=1968.7498779296875, norm_rel=0.024428531527519226, ref_abs_avg=30.374610900878906, test_abs_avg=30.37493133544922
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5799379348754883, max_abs=2.25, mean_rel=0.0983581691980362, max_rel=4.007350921630859, norm_rel=0.025029411539435387, ref_abs_avg=22.766483306884766, test_abs_avg=22.76114273071289
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7303225994110107, max_abs=5.0, mean_rel=0.16531501710414886, max_rel=1504.5592041015625, norm_rel=0.02553885243833065, ref_abs_avg=28.662372589111328, test_abs_avg=28.664701461791992
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6811119318008423, max_abs=4.875, mean_rel=0.2764222025871277, max_rel=1999.9998779296875, norm_rel=0.024075401946902275, ref_abs_avg=28.336816787719727, test_abs_avg=28.340375900268555
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5533472299575806, max_abs=2.125, mean_rel=0.11507842689752579, max_rel=9.197935104370117, norm_rel=0.025139478966593742, ref_abs_avg=21.98394012451172, test_abs_avg=22.024221420288086
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6806949377059937, max_abs=5.5, mean_rel=0.16214722394943237, max_rel=2199.857177734375, norm_rel=0.02514435164630413, ref_abs_avg=27.122100830078125, test_abs_avg=27.12449836730957
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6330724358558655, max_abs=4.5, mean_rel=0.2616455554962158, max_rel=1671.8748779296875, norm_rel=0.023410530760884285, ref_abs_avg=27.054224014282227, test_abs_avg=27.052967071533203
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4850003719329834, max_abs=1.84375, mean_rel=0.09952110052108765, max_rel=10.969922065734863, norm_rel=0.02316589280962944, ref_abs_avg=21.136619567871094, test_abs_avg=21.109819412231445
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.638830304145813, max_abs=5.0, mean_rel=0.15771666169166565, max_rel=791.3847045898438, norm_rel=0.024800673127174377, ref_abs_avg=25.80059814453125, test_abs_avg=25.79819107055664
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5952491760253906, max_abs=4.25, mean_rel=0.2172667682170868, max_rel=1499.9998779296875, norm_rel=0.023353053256869316, ref_abs_avg=25.514707565307617, test_abs_avg=25.512914657592773
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4532322883605957, max_abs=2.0, mean_rel=0.0854598879814148, max_rel=6.628992557525635, norm_rel=0.021526237949728966, ref_abs_avg=20.821928024291992, test_abs_avg=20.829612731933594
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6053097248077393, max_abs=5.5, mean_rel=0.15770797431468964, max_rel=872.9131469726562, norm_rel=0.02434506267309189, ref_abs_avg=24.869911193847656, test_abs_avg=24.86980438232422
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5557141900062561, max_abs=4.0, mean_rel=0.26569443941116333, max_rel=1937.4998779296875, norm_rel=0.02246524766087532, ref_abs_avg=24.68301010131836, test_abs_avg=24.679725646972656
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4547841250896454, max_abs=1.9375, mean_rel=0.3592339754104614, max_rel=80.41735076904297, norm_rel=0.02301146648824215, ref_abs_avg=19.881362915039062, test_abs_avg=19.860877990722656
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5742831230163574, max_abs=6.0, mean_rel=0.14694112539291382, max_rel=1199.3583984375, norm_rel=0.02380143664777279, ref_abs_avg=24.140329360961914, test_abs_avg=24.141494750976562
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5343167781829834, max_abs=5.0, mean_rel=0.2209184169769287, max_rel=1781.2498779296875, norm_rel=0.0222748052328825, ref_abs_avg=24.008941650390625, test_abs_avg=24.016828536987305
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4219251275062561, max_abs=1.625, mean_rel=0.5099871158599854, max_rel=220.92233276367188, norm_rel=0.022036371752619743, ref_abs_avg=19.205078125, test_abs_avg=19.227752685546875
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5382887125015259, max_abs=4.5, mean_rel=0.14751693606376648, max_rel=599.176025390625, norm_rel=0.023469891399145126, ref_abs_avg=22.980670928955078, test_abs_avg=22.979888916015625
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.4994756579399109, max_abs=3.625, mean_rel=0.19899457693099976, max_rel=2156.25, norm_rel=0.02180195413529873, ref_abs_avg=22.837215423583984, test_abs_avg=22.836759567260742
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.42297840118408203, max_abs=1.625, mean_rel=0.17447710037231445, max_rel=52.14506530761719, norm_rel=0.02122931368649006, ref_abs_avg=20.36181640625, test_abs_avg=20.366609573364258
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5212001800537109, max_abs=5.0, mean_rel=0.1388438493013382, max_rel=1083.2310791015625, norm_rel=0.023154370486736298, ref_abs_avg=22.488754272460938, test_abs_avg=22.489681243896484
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4771631360054016, max_abs=3.625, mean_rel=0.20348089933395386, max_rel=1359.3748779296875, norm_rel=0.02160566858947277, ref_abs_avg=22.027130126953125, test_abs_avg=22.026718139648438
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4388742446899414, max_abs=1.75, mean_rel=0.14748947322368622, max_rel=38.276954650878906, norm_rel=0.02400524914264679, ref_abs_avg=19.002262115478516, test_abs_avg=19.016305923461914
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5804200172424316, max_abs=5.0, mean_rel=0.16661521792411804, max_rel=1330.1683349609375, norm_rel=0.024863099679350853, ref_abs_avg=23.38382339477539, test_abs_avg=23.38416290283203
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5404467582702637, max_abs=4.3125, mean_rel=0.229959636926651, max_rel=1812.4998779296875, norm_rel=0.02330606058239937, ref_abs_avg=23.287994384765625, test_abs_avg=23.289615631103516
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4180488586425781, max_abs=1.625, mean_rel=0.0842357873916626, max_rel=5.560904502868652, norm_rel=0.022024812176823616, ref_abs_avg=18.47771453857422, test_abs_avg=18.496437072753906
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5323860049247742, max_abs=4.5, mean_rel=0.14735683798789978, max_rel=515.1685180664062, norm_rel=0.024062734097242355, ref_abs_avg=22.137359619140625, test_abs_avg=22.137800216674805
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4926741123199463, max_abs=4.25, mean_rel=0.2050730437040329, max_rel=1203.125, norm_rel=0.02280055172741413, ref_abs_avg=21.693626403808594, test_abs_avg=21.702119827270508
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3608112335205078, max_abs=1.625, mean_rel=0.06315332651138306, max_rel=3.68159818649292, norm_rel=0.02183787152171135, ref_abs_avg=17.23468589782715, test_abs_avg=17.23355484008789
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4883810579776764, max_abs=4.5, mean_rel=0.14996634423732758, max_rel=746.2698364257812, norm_rel=0.023485861718654633, ref_abs_avg=20.85049057006836, test_abs_avg=20.850955963134766
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.452648401260376, max_abs=3.59375, mean_rel=0.21656200289726257, max_rel=1499.9998779296875, norm_rel=0.02173270285129547, ref_abs_avg=20.76464080810547, test_abs_avg=20.764076232910156
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3508310317993164, max_abs=1.25, mean_rel=0.10991144925355911, max_rel=9.593061447143555, norm_rel=0.021919431164860725, ref_abs_avg=15.80752944946289, test_abs_avg=15.79844856262207
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.46205025911331177, max_abs=5.5, mean_rel=0.14142867922782898, max_rel=952.9501342773438, norm_rel=0.02268730103969574, ref_abs_avg=20.423803329467773, test_abs_avg=20.425735473632812
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.41528767347335815, max_abs=4.0625, mean_rel=0.18324324488639832, max_rel=1343.7498779296875, norm_rel=0.020447734743356705, ref_abs_avg=20.286243438720703, test_abs_avg=20.28391456604004
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.32669591903686523, max_abs=1.25, mean_rel=0.08714333176612854, max_rel=4.3954691886901855, norm_rel=0.019549647346138954, ref_abs_avg=16.7191104888916, test_abs_avg=16.706966400146484
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4363020658493042, max_abs=4.0, mean_rel=0.14793018996715546, max_rel=979.451904296875, norm_rel=0.022618357092142105, ref_abs_avg=19.392419815063477, test_abs_avg=19.39170265197754
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.40092530846595764, max_abs=5.0, mean_rel=0.1953563094139099, max_rel=1281.25, norm_rel=0.021154064685106277, ref_abs_avg=19.07407569885254, test_abs_avg=19.071319580078125
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.32340294122695923, max_abs=1.375, mean_rel=0.08840008080005646, max_rel=5.945061206817627, norm_rel=0.020753707736730576, ref_abs_avg=15.768205642700195, test_abs_avg=15.764816284179688
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.408433735370636, max_abs=4.125, mean_rel=0.13410118222236633, max_rel=477.633544921875, norm_rel=0.02182583510875702, ref_abs_avg=18.862621307373047, test_abs_avg=18.862590789794922
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.37894946336746216, max_abs=4.25, mean_rel=0.16966718435287476, max_rel=1187.5, norm_rel=0.0207638218998909, ref_abs_avg=18.42877197265625, test_abs_avg=18.43560791015625
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30989784002304077, max_abs=1.25, mean_rel=0.1141991913318634, max_rel=27.56589126586914, norm_rel=0.019878309220075607, ref_abs_avg=16.013463973999023, test_abs_avg=15.995319366455078
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.39141032099723816, max_abs=5.0, mean_rel=0.1241198182106018, max_rel=415.77813720703125, norm_rel=0.021469520404934883, ref_abs_avg=18.43975067138672, test_abs_avg=18.43788719177246
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3449894189834595, max_abs=3.25, mean_rel=0.16216129064559937, max_rel=976.5624389648438, norm_rel=0.01905611902475357, ref_abs_avg=18.2755126953125, test_abs_avg=18.26223373413086
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2863086462020874, max_abs=1.25, mean_rel=0.08854902535676956, max_rel=5.831538677215576, norm_rel=0.019013410434126854, ref_abs_avg=15.243175506591797, test_abs_avg=15.236108779907227
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.36008381843566895, max_abs=4.5, mean_rel=0.12605421245098114, max_rel=684.3433837890625, norm_rel=0.02113521285355091, ref_abs_avg=17.31606674194336, test_abs_avg=17.314605712890625
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.34008461236953735, max_abs=4.5, mean_rel=0.16607755422592163, max_rel=1250.0, norm_rel=0.0197917353361845, ref_abs_avg=17.558147430419922, test_abs_avg=17.541919708251953
production_forward2 vs paper_forward output: mean_abs=0.001633478095754981, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008257539011538029, max_abs=0.515625, mean_rel=0.07111905515193939, max_rel=98.14385986328125, norm_rel=0.019344216212630272, ref_abs_avg=0.4628099501132965, test_abs_avg=0.4628225862979889
production_forward2 grad[1] vs paper_forward: mean_abs=7.205407619476318, max_abs=64.0, mean_rel=0.18722547590732574, max_rel=544.744873046875, norm_rel=0.01995508372783661, ref_abs_avg=319.65576171875, test_abs_avg=319.8017272949219
production_forward2 grad[2] vs paper_forward: mean_abs=1.2501263618469238, max_abs=5.0, mean_rel=0.14010962843894958, max_rel=17.962556838989258, norm_rel=0.02306847833096981, ref_abs_avg=54.56006622314453, test_abs_avg=54.65341567993164
production_forward2 grad[3] vs paper_forward: mean_abs=1.5758439302444458, max_abs=10.0, mean_rel=0.16625526547431946, max_rel=1835.102783203125, norm_rel=0.024142036214470863, ref_abs_avg=65.75184631347656, test_abs_avg=65.758056640625
production_forward2 grad[4] vs paper_forward: mean_abs=1.457839012145996, max_abs=9.0, mean_rel=0.41262924671173096, max_rel=3515.624755859375, norm_rel=0.022567598149180412, ref_abs_avg=65.00297546386719, test_abs_avg=65.00471496582031
production_forward2 grad[5] vs paper_forward: mean_abs=1.0926523208618164, max_abs=4.5, mean_rel=0.07541301846504211, max_rel=3.1661598682403564, norm_rel=0.023604294285178185, ref_abs_avg=46.38801574707031, test_abs_avg=46.323875427246094
production_forward2 grad[6] vs paper_forward: mean_abs=1.3886687755584717, max_abs=11.0, mean_rel=0.1592673361301422, max_rel=1329.184814453125, norm_rel=0.023985208943486214, ref_abs_avg=58.367591857910156, test_abs_avg=58.368621826171875
production_forward2 grad[7] vs paper_forward: mean_abs=1.2730573415756226, max_abs=8.0, mean_rel=0.3703366816043854, max_rel=4187.5, norm_rel=0.02231862209737301, ref_abs_avg=57.472023010253906, test_abs_avg=57.47222137451172
production_forward2 grad[8] vs paper_forward: mean_abs=1.0403048992156982, max_abs=3.5, mean_rel=0.14239874482154846, max_rel=30.63620948791504, norm_rel=0.024217864498496056, ref_abs_avg=43.11005783081055, test_abs_avg=43.052635192871094
production_forward2 grad[9] vs paper_forward: mean_abs=1.270287036895752, max_abs=9.0, mean_rel=0.1637004017829895, max_rel=2919.664794921875, norm_rel=0.023756330832839012, ref_abs_avg=53.86910629272461, test_abs_avg=53.86975860595703
production_forward2 grad[10] vs paper_forward: mean_abs=1.1647045612335205, max_abs=6.75, mean_rel=0.4613497257232666, max_rel=4781.25, norm_rel=0.022197721526026726, ref_abs_avg=52.80781173706055, test_abs_avg=52.80363464355469
production_forward2 grad[11] vs paper_forward: mean_abs=0.9172935485839844, max_abs=3.25, mean_rel=0.0887511670589447, max_rel=4.6226911544799805, norm_rel=0.020775172859430313, ref_abs_avg=44.6590461730957, test_abs_avg=44.663719177246094
production_forward2 grad[12] vs paper_forward: mean_abs=1.1649024486541748, max_abs=8.0, mean_rel=0.1563510298728943, max_rel=1522.494384765625, norm_rel=0.023562293499708176, ref_abs_avg=49.7492790222168, test_abs_avg=49.7506103515625
production_forward2 grad[13] vs paper_forward: mean_abs=1.0668437480926514, max_abs=6.25, mean_rel=0.37649062275886536, max_rel=2874.999755859375, norm_rel=0.02188897505402565, ref_abs_avg=48.99083709716797, test_abs_avg=48.98691940307617
production_forward2 grad[14] vs paper_forward: mean_abs=0.848663330078125, max_abs=3.25, mean_rel=0.10763120651245117, max_rel=17.43576431274414, norm_rel=0.021005135029554367, ref_abs_avg=39.881309509277344, test_abs_avg=39.87255859375
production_forward2 grad[15] vs paper_forward: mean_abs=1.094990611076355, max_abs=7.0, mean_rel=0.15183836221694946, max_rel=1133.0506591796875, norm_rel=0.023444026708602905, ref_abs_avg=47.02465057373047, test_abs_avg=47.02617263793945
production_forward2 grad[16] vs paper_forward: mean_abs=1.0013651847839355, max_abs=6.75, mean_rel=0.2980579733848572, max_rel=2687.499755859375, norm_rel=0.021821144968271255, ref_abs_avg=46.133567810058594, test_abs_avg=46.14125442504883
production_forward2 grad[17] vs paper_forward: mean_abs=0.7903375625610352, max_abs=3.25, mean_rel=0.05996483936905861, max_rel=1.883025050163269, norm_rel=0.020611261948943138, ref_abs_avg=39.607421875, test_abs_avg=39.604637145996094
production_forward2 grad[18] vs paper_forward: mean_abs=1.0209214687347412, max_abs=8.0, mean_rel=0.1617361456155777, max_rel=1372.2750244140625, norm_rel=0.023234162479639053, ref_abs_avg=44.20476150512695, test_abs_avg=44.20779037475586
production_forward2 grad[19] vs paper_forward: mean_abs=0.9398967623710632, max_abs=5.78125, mean_rel=0.2744932174682617, max_rel=3499.999755859375, norm_rel=0.021568888798356056, ref_abs_avg=43.7913703918457, test_abs_avg=43.795169830322266
production_forward2 grad[20] vs paper_forward: mean_abs=0.7523727416992188, max_abs=3.0, mean_rel=0.09606119245290756, max_rel=9.970006942749023, norm_rel=0.0224323607981205, ref_abs_avg=33.33338928222656, test_abs_avg=33.290008544921875
production_forward2 grad[21] vs paper_forward: mean_abs=0.974308431148529, max_abs=7.5, mean_rel=0.16259215772151947, max_rel=1903.1201171875, norm_rel=0.023200633004307747, ref_abs_avg=42.23298263549805, test_abs_avg=42.23289489746094
production_forward2 grad[22] vs paper_forward: mean_abs=0.8944091796875, max_abs=5.78125, mean_rel=0.29242298007011414, max_rel=3218.749755859375, norm_rel=0.02158360555768013, ref_abs_avg=41.61048889160156, test_abs_avg=41.61630630493164
production_forward2 grad[23] vs paper_forward: mean_abs=0.7243638038635254, max_abs=2.96875, mean_rel=0.13465692102909088, max_rel=11.623663902282715, norm_rel=0.021558180451393127, ref_abs_avg=34.23834228515625, test_abs_avg=34.232994079589844
production_forward2 grad[24] vs paper_forward: mean_abs=0.9288046360015869, max_abs=7.0, mean_rel=0.14933937788009644, max_rel=1043.4564208984375, norm_rel=0.02302238903939724, ref_abs_avg=40.58786392211914, test_abs_avg=40.58925247192383
production_forward2 grad[25] vs paper_forward: mean_abs=0.8466500043869019, max_abs=5.21875, mean_rel=0.2855740785598755, max_rel=2593.749755859375, norm_rel=0.021292950958013535, ref_abs_avg=39.93577575683594, test_abs_avg=39.9398193359375
production_forward2 grad[26] vs paper_forward: mean_abs=0.8299088478088379, max_abs=4.28125, mean_rel=0.13300447165966034, max_rel=25.22145652770996, norm_rel=0.024393700063228607, ref_abs_avg=35.917110443115234, test_abs_avg=35.963890075683594
production_forward2 grad[27] vs paper_forward: mean_abs=1.07735276222229, max_abs=7.0, mean_rel=0.17432911694049835, max_rel=2506.41845703125, norm_rel=0.024836309254169464, ref_abs_avg=43.61906433105469, test_abs_avg=43.61848449707031
production_forward2 grad[28] vs paper_forward: mean_abs=0.9966209530830383, max_abs=6.0625, mean_rel=0.33297285437583923, max_rel=4750.0, norm_rel=0.023141777142882347, ref_abs_avg=43.233497619628906, test_abs_avg=43.244937896728516
production_forward2 grad[29] vs paper_forward: mean_abs=0.7989349365234375, max_abs=3.25, mean_rel=0.13423548638820648, max_rel=8.45030403137207, norm_rel=0.02400190569460392, ref_abs_avg=33.34352493286133, test_abs_avg=33.42670822143555
production_forward2 grad[30] vs paper_forward: mean_abs=0.9897015690803528, max_abs=7.5, mean_rel=0.16684892773628235, max_rel=910.2296142578125, norm_rel=0.025188898667693138, ref_abs_avg=39.49658203125, test_abs_avg=39.49785232543945
production_forward2 grad[31] vs paper_forward: mean_abs=0.9220681190490723, max_abs=6.625, mean_rel=0.3293399512767792, max_rel=2749.999755859375, norm_rel=0.023667411878705025, ref_abs_avg=39.143959045410156, test_abs_avg=39.14204025268555
production_forward2 grad[32] vs paper_forward: mean_abs=0.7363872528076172, max_abs=3.0, mean_rel=0.07279999554157257, max_rel=4.175570964813232, norm_rel=0.023979634046554565, ref_abs_avg=31.040504455566406, test_abs_avg=30.954687118530273
production_forward2 grad[33] vs paper_forward: mean_abs=0.9198131561279297, max_abs=5.75, mean_rel=0.16121244430541992, max_rel=1032.059326171875, norm_rel=0.0250199344009161, ref_abs_avg=36.96323776245117, test_abs_avg=36.963958740234375
production_forward2 grad[34] vs paper_forward: mean_abs=0.8514114618301392, max_abs=5.0, mean_rel=0.2690410912036896, max_rel=2187.5, norm_rel=0.023482685908675194, ref_abs_avg=36.348297119140625, test_abs_avg=36.35802459716797
production_forward2 grad[35] vs paper_forward: mean_abs=0.6947860717773438, max_abs=2.75, mean_rel=0.05630755424499512, max_rel=2.228774070739746, norm_rel=0.02346654236316681, ref_abs_avg=30.47289276123047, test_abs_avg=30.408447265625
production_forward2 grad[36] vs paper_forward: mean_abs=0.8642527461051941, max_abs=6.0, mean_rel=0.16477379202842712, max_rel=1431.9102783203125, norm_rel=0.024927562102675438, ref_abs_avg=34.82465362548828, test_abs_avg=34.82550048828125
production_forward2 grad[37] vs paper_forward: mean_abs=0.8001699447631836, max_abs=4.875, mean_rel=0.28861457109451294, max_rel=2500.0, norm_rel=0.02334744483232498, ref_abs_avg=34.327762603759766, test_abs_avg=34.330406188964844
production_forward2 grad[38] vs paper_forward: mean_abs=0.6534538269042969, max_abs=2.625, mean_rel=0.10874753445386887, max_rel=17.48021125793457, norm_rel=0.02278260886669159, ref_abs_avg=28.42387580871582, test_abs_avg=28.45887565612793
production_forward2 grad[39] vs paper_forward: mean_abs=0.8176848888397217, max_abs=5.5, mean_rel=0.15434539318084717, max_rel=960.3948974609375, norm_rel=0.024433588609099388, ref_abs_avg=33.57039260864258, test_abs_avg=33.571815490722656
production_forward2 grad[40] vs paper_forward: mean_abs=0.7580397725105286, max_abs=5.0, mean_rel=0.31103256344795227, max_rel=2749.999755859375, norm_rel=0.02306702546775341, ref_abs_avg=32.9521484375, test_abs_avg=32.95039367675781
production_forward2 grad[41] vs paper_forward: mean_abs=0.5957679748535156, max_abs=2.359375, mean_rel=0.10184772312641144, max_rel=7.174206733703613, norm_rel=0.02174048125743866, ref_abs_avg=27.941038131713867, test_abs_avg=27.903200149536133
production_forward2 grad[42] vs paper_forward: mean_abs=0.7780894637107849, max_abs=6.0, mean_rel=0.1556810438632965, max_rel=1125.3284912109375, norm_rel=0.024189114570617676, ref_abs_avg=32.22244644165039, test_abs_avg=32.222801208496094
production_forward2 grad[43] vs paper_forward: mean_abs=0.7217513918876648, max_abs=4.5, mean_rel=0.2728955149650574, max_rel=2125.0, norm_rel=0.023003630340099335, ref_abs_avg=31.433212280273438, test_abs_avg=31.42707061767578
production_forward2 grad[44] vs paper_forward: mean_abs=0.5361502170562744, max_abs=2.25, mean_rel=0.4353620409965515, max_rel=170.19097900390625, norm_rel=0.020043810829520226, ref_abs_avg=26.964502334594727, test_abs_avg=26.959003448486328
production_forward2 grad[45] vs paper_forward: mean_abs=0.735532283782959, max_abs=5.0, mean_rel=0.16005302965641022, max_rel=942.637939453125, norm_rel=0.024224815890192986, ref_abs_avg=30.483619689941406, test_abs_avg=30.48486328125
production_forward2 grad[46] vs paper_forward: mean_abs=0.6788387298583984, max_abs=4.625, mean_rel=0.27982914447784424, max_rel=3062.499755859375, norm_rel=0.022588223218917847, ref_abs_avg=30.13311767578125, test_abs_avg=30.131362915039062
production_forward2 grad[47] vs paper_forward: mean_abs=0.5372967720031738, max_abs=2.25, mean_rel=0.15617360174655914, max_rel=12.174405097961426, norm_rel=0.021525949239730835, ref_abs_avg=25.677217483520508, test_abs_avg=25.734899520874023
production_forward2 grad[48] vs paper_forward: mean_abs=0.7150889039039612, max_abs=6.0, mean_rel=0.1527353674173355, max_rel=1376.7135009765625, norm_rel=0.02390715666115284, ref_abs_avg=29.997859954833984, test_abs_avg=29.996402740478516
production_forward2 grad[49] vs paper_forward: mean_abs=0.6594635248184204, max_abs=5.25, mean_rel=0.2408795803785324, max_rel=2250.0, norm_rel=0.02263766899704933, ref_abs_avg=29.202930450439453, test_abs_avg=29.20625114440918
production_forward2 grad[50] vs paper_forward: mean_abs=0.6336860656738281, max_abs=2.5625, mean_rel=0.0932953804731369, max_rel=6.062620639801025, norm_rel=0.02529011107981205, ref_abs_avg=24.933258056640625, test_abs_avg=24.88225555419922
production_forward2 grad[51] vs paper_forward: mean_abs=0.7809044122695923, max_abs=5.5, mean_rel=0.1683441400527954, max_rel=1147.653076171875, norm_rel=0.025707386434078217, ref_abs_avg=30.463340759277344, test_abs_avg=30.462684631347656
production_forward2 grad[52] vs paper_forward: mean_abs=0.7300242185592651, max_abs=4.5, mean_rel=0.2669707238674164, max_rel=2265.625, norm_rel=0.024059684947133064, ref_abs_avg=30.374610900878906, test_abs_avg=30.377796173095703
production_forward2 grad[53] vs paper_forward: mean_abs=0.5752172470092773, max_abs=1.9375, mean_rel=0.09623073041439056, max_rel=3.0807957649230957, norm_rel=0.02446109987795353, ref_abs_avg=22.766483306884766, test_abs_avg=22.77743911743164
production_forward2 grad[54] vs paper_forward: mean_abs=0.7191876173019409, max_abs=5.0, mean_rel=0.16423672437667847, max_rel=1426.380859375, norm_rel=0.025156868621706963, ref_abs_avg=28.662372589111328, test_abs_avg=28.664464950561523
production_forward2 grad[55] vs paper_forward: mean_abs=0.6673415899276733, max_abs=4.5, mean_rel=0.26541346311569214, max_rel=1937.4998779296875, norm_rel=0.02361196093261242, ref_abs_avg=28.336816787719727, test_abs_avg=28.338573455810547
production_forward2 grad[56] vs paper_forward: mean_abs=0.5370489358901978, max_abs=2.0625, mean_rel=0.11898427456617355, max_rel=19.220109939575195, norm_rel=0.02452000603079796, ref_abs_avg=21.98394012451172, test_abs_avg=22.030197143554688
production_forward2 grad[57] vs paper_forward: mean_abs=0.6714156866073608, max_abs=5.5, mean_rel=0.16040337085723877, max_rel=1670.487548828125, norm_rel=0.02480393834412098, ref_abs_avg=27.122100830078125, test_abs_avg=27.1254940032959
production_forward2 grad[58] vs paper_forward: mean_abs=0.6229841709136963, max_abs=4.5, mean_rel=0.25052937865257263, max_rel=2218.75, norm_rel=0.023048115894198418, ref_abs_avg=27.054224014282227, test_abs_avg=27.052383422851562
production_forward2 grad[59] vs paper_forward: mean_abs=0.5045301914215088, max_abs=1.75, mean_rel=0.12164834886789322, max_rel=21.535314559936523, norm_rel=0.02365521341562271, ref_abs_avg=21.136619567871094, test_abs_avg=21.12721824645996
production_forward2 grad[60] vs paper_forward: mean_abs=0.6301783323287964, max_abs=5.0, mean_rel=0.15676508843898773, max_rel=1236.67333984375, norm_rel=0.024477645754814148, ref_abs_avg=25.80059814453125, test_abs_avg=25.79873275756836
production_forward2 grad[61] vs paper_forward: mean_abs=0.5849899053573608, max_abs=4.25, mean_rel=0.20337817072868347, max_rel=1406.2498779296875, norm_rel=0.022953180596232414, ref_abs_avg=25.514707565307617, test_abs_avg=25.51446533203125
production_forward2 grad[62] vs paper_forward: mean_abs=0.4488711357116699, max_abs=1.75, mean_rel=0.09518755972385406, max_rel=9.740891456604004, norm_rel=0.021343305706977844, ref_abs_avg=20.821928024291992, test_abs_avg=20.832929611206055
production_forward2 grad[63] vs paper_forward: mean_abs=0.5986669063568115, max_abs=6.0, mean_rel=0.15685683488845825, max_rel=727.3963012695312, norm_rel=0.024083511903882027, ref_abs_avg=24.869911193847656, test_abs_avg=24.86954689025879
production_forward2 grad[64] vs paper_forward: mean_abs=0.5499686598777771, max_abs=4.25, mean_rel=0.2563495337963104, max_rel=2375.0, norm_rel=0.022234514355659485, ref_abs_avg=24.68301010131836, test_abs_avg=24.68041229248047
production_forward2 grad[65] vs paper_forward: mean_abs=0.45436331629753113, max_abs=1.82421875, mean_rel=0.7424437999725342, max_rel=242.2499542236328, norm_rel=0.023217305541038513, ref_abs_avg=19.881362915039062, test_abs_avg=19.852210998535156
production_forward2 grad[66] vs paper_forward: mean_abs=0.5684493780136108, max_abs=6.0, mean_rel=0.1443517655134201, max_rel=699.6616821289062, norm_rel=0.023570267483592033, ref_abs_avg=24.140329360961914, test_abs_avg=24.140625
production_forward2 grad[67] vs paper_forward: mean_abs=0.5319273471832275, max_abs=4.5, mean_rel=0.2293834239244461, max_rel=1718.7498779296875, norm_rel=0.022179771214723587, ref_abs_avg=24.008941650390625, test_abs_avg=24.01681137084961
production_forward2 grad[68] vs paper_forward: mean_abs=0.4213443398475647, max_abs=1.5, mean_rel=0.6738296747207642, max_rel=299.0049743652344, norm_rel=0.021950209513306618, ref_abs_avg=19.205078125, test_abs_avg=19.23273468017578
production_forward2 grad[69] vs paper_forward: mean_abs=0.5334911346435547, max_abs=5.0, mean_rel=0.1468566358089447, max_rel=706.8729858398438, norm_rel=0.023244095966219902, ref_abs_avg=22.980670928955078, test_abs_avg=22.979541778564453
production_forward2 grad[70] vs paper_forward: mean_abs=0.4955715835094452, max_abs=3.640625, mean_rel=0.18964093923568726, max_rel=1624.9998779296875, norm_rel=0.021644996479153633, ref_abs_avg=22.837215423583984, test_abs_avg=22.8327579498291
production_forward2 grad[71] vs paper_forward: mean_abs=0.4183492660522461, max_abs=1.875, mean_rel=0.13891184329986572, max_rel=39.51945495605469, norm_rel=0.021020615473389626, ref_abs_avg=20.36181640625, test_abs_avg=20.347618103027344
production_forward2 grad[72] vs paper_forward: mean_abs=0.5166003108024597, max_abs=4.5, mean_rel=0.13722831010818481, max_rel=848.3492431640625, norm_rel=0.022950174286961555, ref_abs_avg=22.488754272460938, test_abs_avg=22.489418029785156
production_forward2 grad[73] vs paper_forward: mean_abs=0.47144103050231934, max_abs=4.0, mean_rel=0.20135556161403656, max_rel=1437.4998779296875, norm_rel=0.021378764882683754, ref_abs_avg=22.027130126953125, test_abs_avg=22.02974510192871
production_forward2 grad[74] vs paper_forward: mean_abs=0.43825578689575195, max_abs=1.625, mean_rel=0.13957394659519196, max_rel=38.692020416259766, norm_rel=0.023610848933458328, ref_abs_avg=19.002262115478516, test_abs_avg=19.005779266357422
production_forward2 grad[75] vs paper_forward: mean_abs=0.572036862373352, max_abs=4.5, mean_rel=0.1626746505498886, max_rel=1274.3756103515625, norm_rel=0.024523435160517693, ref_abs_avg=23.38382339477539, test_abs_avg=23.384559631347656
production_forward2 grad[76] vs paper_forward: mean_abs=0.5307968854904175, max_abs=4.1875, mean_rel=0.2268759310245514, max_rel=1843.7498779296875, norm_rel=0.02289128489792347, ref_abs_avg=23.287994384765625, test_abs_avg=23.288942337036133
production_forward2 grad[77] vs paper_forward: mean_abs=0.40941429138183594, max_abs=1.8125, mean_rel=0.0946609377861023, max_rel=6.803074359893799, norm_rel=0.021878628060221672, ref_abs_avg=18.47771453857422, test_abs_avg=18.5051326751709
production_forward2 grad[78] vs paper_forward: mean_abs=0.5260473489761353, max_abs=4.25, mean_rel=0.14384537935256958, max_rel=480.9143371582031, norm_rel=0.023789795115590096, ref_abs_avg=22.137359619140625, test_abs_avg=22.136722564697266
production_forward2 grad[79] vs paper_forward: mean_abs=0.48889774084091187, max_abs=3.75, mean_rel=0.2113284170627594, max_rel=1250.0, norm_rel=0.02263449691236019, ref_abs_avg=21.693626403808594, test_abs_avg=21.70268440246582
production_forward2 grad[80] vs paper_forward: mean_abs=0.3812694549560547, max_abs=1.625, mean_rel=0.06325623393058777, max_rel=3.1278200149536133, norm_rel=0.02271849662065506, ref_abs_avg=17.23468589782715, test_abs_avg=17.243099212646484
production_forward2 grad[81] vs paper_forward: mean_abs=0.4837799072265625, max_abs=5.0, mean_rel=0.1484089195728302, max_rel=793.2633056640625, norm_rel=0.023277509957551956, ref_abs_avg=20.85049057006836, test_abs_avg=20.851165771484375
production_forward2 grad[82] vs paper_forward: mean_abs=0.4518340229988098, max_abs=3.8125, mean_rel=0.2138252854347229, max_rel=1468.7498779296875, norm_rel=0.02169385552406311, ref_abs_avg=20.76464080810547, test_abs_avg=20.76896858215332
production_forward2 grad[83] vs paper_forward: mean_abs=0.3278770446777344, max_abs=1.25, mean_rel=0.13410848379135132, max_rel=16.05437469482422, norm_rel=0.020548274740576744, ref_abs_avg=15.80752944946289, test_abs_avg=15.805610656738281
production_forward2 grad[84] vs paper_forward: mean_abs=0.45851829648017883, max_abs=5.0, mean_rel=0.13937655091285706, max_rel=768.4234619140625, norm_rel=0.022526483982801437, ref_abs_avg=20.423803329467773, test_abs_avg=20.42552947998047
production_forward2 grad[85] vs paper_forward: mean_abs=0.4145107865333557, max_abs=4.0, mean_rel=0.18856123089790344, max_rel=1593.7498779296875, norm_rel=0.020460568368434906, ref_abs_avg=20.286243438720703, test_abs_avg=20.282140731811523
production_forward2 grad[86] vs paper_forward: mean_abs=0.3381919860839844, max_abs=1.5, mean_rel=0.0897628664970398, max_rel=5.936955451965332, norm_rel=0.020096246153116226, ref_abs_avg=16.7191104888916, test_abs_avg=16.704627990722656
production_forward2 grad[87] vs paper_forward: mean_abs=0.4338127374649048, max_abs=4.0, mean_rel=0.14242394268512726, max_rel=1353.918212890625, norm_rel=0.022493531927466393, ref_abs_avg=19.392419815063477, test_abs_avg=19.391273498535156
production_forward2 grad[88] vs paper_forward: mean_abs=0.40276259183883667, max_abs=4.0, mean_rel=0.19434663653373718, max_rel=1226.5625, norm_rel=0.02129313535988331, ref_abs_avg=19.07407569885254, test_abs_avg=19.070283889770508
production_forward2 grad[89] vs paper_forward: mean_abs=0.3222651481628418, max_abs=1.375, mean_rel=0.09383776038885117, max_rel=7.681109428405762, norm_rel=0.020886491984128952, ref_abs_avg=15.768205642700195, test_abs_avg=15.770414352416992
production_forward2 grad[90] vs paper_forward: mean_abs=0.40581464767456055, max_abs=4.0, mean_rel=0.13219013810157776, max_rel=389.99542236328125, norm_rel=0.02169518545269966, ref_abs_avg=18.862621307373047, test_abs_avg=18.862869262695312
production_forward2 grad[91] vs paper_forward: mean_abs=0.37594369053840637, max_abs=4.0, mean_rel=0.16684474050998688, max_rel=874.9999389648438, norm_rel=0.020584939047694206, ref_abs_avg=18.42877197265625, test_abs_avg=18.434673309326172
production_forward2 grad[92] vs paper_forward: mean_abs=0.30406635999679565, max_abs=1.15625, mean_rel=0.15132196247577667, max_rel=44.79315185546875, norm_rel=0.019646495580673218, ref_abs_avg=16.013463973999023, test_abs_avg=15.994158744812012
production_forward2 grad[93] vs paper_forward: mean_abs=0.3900204598903656, max_abs=5.0, mean_rel=0.1270671784877777, max_rel=452.8345947265625, norm_rel=0.02140078879892826, ref_abs_avg=18.43975067138672, test_abs_avg=18.437969207763672
production_forward2 grad[94] vs paper_forward: mean_abs=0.34810084104537964, max_abs=3.75, mean_rel=0.15695726871490479, max_rel=1062.5, norm_rel=0.0192633718252182, ref_abs_avg=18.2755126953125, test_abs_avg=18.266498565673828
production_forward2 grad[95] vs paper_forward: mean_abs=0.28409624099731445, max_abs=1.125, mean_rel=0.07845275849103928, max_rel=6.754331588745117, norm_rel=0.01888086274266243, ref_abs_avg=15.243175506591797, test_abs_avg=15.232295989990234
production_forward2 grad[96] vs paper_forward: mean_abs=0.3593784272670746, max_abs=4.0, mean_rel=0.1267668902873993, max_rel=628.427490234375, norm_rel=0.021102655678987503, ref_abs_avg=17.31606674194336, test_abs_avg=17.31464385986328
production_forward2 grad[97] vs paper_forward: mean_abs=0.3338397443294525, max_abs=4.0, mean_rel=0.15508130192756653, max_rel=1312.4998779296875, norm_rel=0.019444439560174942, ref_abs_avg=17.558147430419922, test_abs_avg=17.545896530151367
identity layers + randn queries
paper_forward fwd+bwd:  382.543 ms
paper_forward bwd-only: 302.398 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.740 GiB, fwd+bwd=32.490 GiB
production_forward2 fwd+bwd:  114.293 ms
production_forward2 bwd-only: 95.968 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.320 GiB, fwd+bwd=10.320 GiB
production_forward fwd+bwd:  114.951 ms
production_forward bwd-only: 95.968 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.320 GiB, fwd+bwd=10.320 GiB
torch_compile_phases_forward fwd+bwd:  166.013 ms
torch_compile_phases_forward bwd-only: 132.732 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016249200562015176, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008306545205414295, max_abs=0.37109375, mean_rel=0.07174129784107208, max_rel=143.74325561523438, norm_rel=0.01966191455721855, ref_abs_avg=0.45904994010925293, test_abs_avg=0.4590739607810974
production_forward grad[1] vs paper_forward: mean_abs=7.172598361968994, max_abs=52.0, mean_rel=0.23086382448673248, max_rel=1232.5377197265625, norm_rel=0.020116547122597694, ref_abs_avg=314.73284912109375, test_abs_avg=314.7210388183594
production_forward grad[2] vs paper_forward: mean_abs=1.2512116432189941, max_abs=4.5, mean_rel=0.09670110791921616, max_rel=12.483355522155762, norm_rel=0.02360294759273529, ref_abs_avg=53.657623291015625, test_abs_avg=53.58723831176758
production_forward grad[3] vs paper_forward: mean_abs=1.609321117401123, max_abs=12.0, mean_rel=0.17265847325325012, max_rel=2893.66748046875, norm_rel=0.024210026487708092, ref_abs_avg=66.95124053955078, test_abs_avg=66.95579528808594
production_forward grad[4] vs paper_forward: mean_abs=1.4881222248077393, max_abs=9.0, mean_rel=0.40187299251556396, max_rel=4437.5, norm_rel=0.022610699757933617, ref_abs_avg=66.2353515625, test_abs_avg=66.2419662475586
production_forward grad[5] vs paper_forward: mean_abs=1.095576286315918, max_abs=4.375, mean_rel=0.12355822324752808, max_rel=20.909807205200195, norm_rel=0.021551137790083885, ref_abs_avg=50.77986526489258, test_abs_avg=50.70476531982422
production_forward grad[6] vs paper_forward: mean_abs=1.3977489471435547, max_abs=9.0, mean_rel=0.16459617018699646, max_rel=2655.027587890625, norm_rel=0.023900270462036133, ref_abs_avg=58.918357849121094, test_abs_avg=58.922279357910156
production_forward grad[7] vs paper_forward: mean_abs=1.2849594354629517, max_abs=7.5, mean_rel=0.38772937655448914, max_rel=3374.999755859375, norm_rel=0.022171499207615852, ref_abs_avg=58.356788635253906, test_abs_avg=58.360843658447266
production_forward grad[8] vs paper_forward: mean_abs=1.0180931091308594, max_abs=4.375, mean_rel=0.09521771967411041, max_rel=8.896187782287598, norm_rel=0.02359381690621376, ref_abs_avg=44.139198303222656, test_abs_avg=44.12858200073242
production_forward grad[9] vs paper_forward: mean_abs=1.2720451354980469, max_abs=9.0, mean_rel=0.15708285570144653, max_rel=1719.86376953125, norm_rel=0.02381480485200882, ref_abs_avg=53.80351638793945, test_abs_avg=53.80965042114258
production_forward grad[10] vs paper_forward: mean_abs=1.1742055416107178, max_abs=7.0, mean_rel=0.32917582988739014, max_rel=3312.499755859375, norm_rel=0.02216903492808342, ref_abs_avg=53.212928771972656, test_abs_avg=53.225181579589844
production_forward grad[11] vs paper_forward: mean_abs=0.9403972625732422, max_abs=4.0, mean_rel=0.18043124675750732, max_rel=28.921911239624023, norm_rel=0.023637570440769196, ref_abs_avg=39.82054901123047, test_abs_avg=39.72458267211914
production_forward grad[12] vs paper_forward: mean_abs=1.1800508499145508, max_abs=9.0, mean_rel=0.15277567505836487, max_rel=1199.2994384765625, norm_rel=0.023595118895173073, ref_abs_avg=50.37930679321289, test_abs_avg=50.38459777832031
production_forward grad[13] vs paper_forward: mean_abs=1.0800089836120605, max_abs=6.75, mean_rel=0.31774720549583435, max_rel=3874.999755859375, norm_rel=0.021917227655649185, ref_abs_avg=49.530914306640625, test_abs_avg=49.53687286376953
production_forward grad[14] vs paper_forward: mean_abs=0.8394813537597656, max_abs=3.75, mean_rel=0.08390162140130997, max_rel=7.075746536254883, norm_rel=0.02257753349840641, ref_abs_avg=37.634517669677734, test_abs_avg=37.72545623779297
production_forward grad[15] vs paper_forward: mean_abs=1.0897717475891113, max_abs=8.0, mean_rel=0.14534097909927368, max_rel=961.0870361328125, norm_rel=0.023493692278862, ref_abs_avg=46.71833801269531, test_abs_avg=46.722267150878906
production_forward grad[16] vs paper_forward: mean_abs=1.004586935043335, max_abs=6.0, mean_rel=0.2934938371181488, max_rel=3656.249755859375, norm_rel=0.02177765592932701, ref_abs_avg=46.41313934326172, test_abs_avg=46.416748046875
production_forward grad[17] vs paper_forward: mean_abs=0.8279609680175781, max_abs=3.0, mean_rel=0.07959480583667755, max_rel=6.260652542114258, norm_rel=0.02348591387271881, ref_abs_avg=35.599334716796875, test_abs_avg=35.58278274536133
production_forward grad[18] vs paper_forward: mean_abs=1.034170150756836, max_abs=7.375, mean_rel=0.14192867279052734, max_rel=579.8646240234375, norm_rel=0.02329239249229431, ref_abs_avg=44.66448211669922, test_abs_avg=44.667633056640625
production_forward grad[19] vs paper_forward: mean_abs=0.9456376433372498, max_abs=5.59375, mean_rel=0.27580827474594116, max_rel=2218.75, norm_rel=0.0218039657920599, ref_abs_avg=43.53199768066406, test_abs_avg=43.53443145751953
production_forward grad[20] vs paper_forward: mean_abs=0.7743062973022461, max_abs=3.25, mean_rel=0.09226440638303757, max_rel=10.454981803894043, norm_rel=0.02213202603161335, ref_abs_avg=35.98912048339844, test_abs_avg=35.99641418457031
production_forward grad[21] vs paper_forward: mean_abs=0.9808015823364258, max_abs=7.0, mean_rel=0.15404967963695526, max_rel=953.7110595703125, norm_rel=0.02326895296573639, ref_abs_avg=42.420265197753906, test_abs_avg=42.424560546875
production_forward grad[22] vs paper_forward: mean_abs=0.9004380702972412, max_abs=6.125, mean_rel=0.3000533878803253, max_rel=3124.999755859375, norm_rel=0.021656405180692673, ref_abs_avg=41.7896728515625, test_abs_avg=41.79241180419922
production_forward grad[23] vs paper_forward: mean_abs=0.7263975143432617, max_abs=2.75, mean_rel=0.06337577104568481, max_rel=1.9637550115585327, norm_rel=0.021860092878341675, ref_abs_avg=33.269832611083984, test_abs_avg=33.25592041015625
production_forward grad[24] vs paper_forward: mean_abs=0.93125319480896, max_abs=8.0, mean_rel=0.16003422439098358, max_rel=1757.6077880859375, norm_rel=0.023108042776584625, ref_abs_avg=40.59735107421875, test_abs_avg=40.59941482543945
production_forward grad[25] vs paper_forward: mean_abs=0.8551899194717407, max_abs=5.0, mean_rel=0.23747007548809052, max_rel=3156.249755859375, norm_rel=0.021395450457930565, ref_abs_avg=40.18357849121094, test_abs_avg=40.183387756347656
production_forward grad[26] vs paper_forward: mean_abs=0.8004405498504639, max_abs=4.4375, mean_rel=0.11788035929203033, max_rel=14.338301658630371, norm_rel=0.02302624098956585, ref_abs_avg=35.70734405517578, test_abs_avg=35.71240234375
production_forward grad[27] vs paper_forward: mean_abs=1.0741682052612305, max_abs=10.0, mean_rel=0.17397567629814148, max_rel=1665.4246826171875, norm_rel=0.024959323927760124, ref_abs_avg=43.27574920654297, test_abs_avg=43.279144287109375
production_forward grad[28] vs paper_forward: mean_abs=0.992914080619812, max_abs=6.90625, mean_rel=0.2985950708389282, max_rel=2906.249755859375, norm_rel=0.023293130099773407, ref_abs_avg=42.773773193359375, test_abs_avg=42.77903366088867
production_forward grad[29] vs paper_forward: mean_abs=0.7986912727355957, max_abs=3.125, mean_rel=0.1331602782011032, max_rel=14.904444694519043, norm_rel=0.024667538702487946, ref_abs_avg=32.54298400878906, test_abs_avg=32.55059051513672
production_forward grad[30] vs paper_forward: mean_abs=0.9995612502098083, max_abs=7.0, mean_rel=0.17220503091812134, max_rel=1576.409423828125, norm_rel=0.025390418246388435, ref_abs_avg=39.6002082824707, test_abs_avg=39.60150909423828
production_forward grad[31] vs paper_forward: mean_abs=0.9323668479919434, max_abs=6.0, mean_rel=0.3473207354545593, max_rel=2749.999755859375, norm_rel=0.02384822815656662, ref_abs_avg=39.19947052001953, test_abs_avg=39.20320129394531
production_forward grad[32] vs paper_forward: mean_abs=0.717207670211792, max_abs=3.1875, mean_rel=0.1098640039563179, max_rel=17.369813919067383, norm_rel=0.02459469996392727, ref_abs_avg=30.383373260498047, test_abs_avg=30.419145584106445
production_forward grad[33] vs paper_forward: mean_abs=0.9313628673553467, max_abs=6.0, mean_rel=0.17155420780181885, max_rel=1516.0369873046875, norm_rel=0.02518024854362011, ref_abs_avg=37.12568283081055, test_abs_avg=37.12870788574219
production_forward grad[34] vs paper_forward: mean_abs=0.8731162548065186, max_abs=5.25, mean_rel=0.33678317070007324, max_rel=3156.249755859375, norm_rel=0.024019014090299606, ref_abs_avg=36.418251037597656, test_abs_avg=36.42169952392578
production_forward grad[35] vs paper_forward: mean_abs=0.6830329895019531, max_abs=2.75, mean_rel=0.246966153383255, max_rel=74.83293151855469, norm_rel=0.024378644302487373, ref_abs_avg=28.30740737915039, test_abs_avg=28.28736114501953
production_forward grad[36] vs paper_forward: mean_abs=0.8742415904998779, max_abs=6.0, mean_rel=0.17181377112865448, max_rel=2485.587158203125, norm_rel=0.025076311081647873, ref_abs_avg=35.04063415527344, test_abs_avg=35.041378021240234
production_forward grad[37] vs paper_forward: mean_abs=0.812264621257782, max_abs=4.921875, mean_rel=0.25438565015792847, max_rel=2125.0, norm_rel=0.023585913702845573, ref_abs_avg=34.50562286376953, test_abs_avg=34.51274871826172
production_forward grad[38] vs paper_forward: mean_abs=0.6240799427032471, max_abs=2.5, mean_rel=0.43621352314949036, max_rel=187.72401428222656, norm_rel=0.02324431948363781, ref_abs_avg=27.21369171142578, test_abs_avg=27.230688095092773
production_forward grad[39] vs paper_forward: mean_abs=0.8272293210029602, max_abs=6.0, mean_rel=0.1609855592250824, max_rel=1405.8453369140625, norm_rel=0.024842804297804832, ref_abs_avg=33.41119384765625, test_abs_avg=33.41373825073242
production_forward grad[40] vs paper_forward: mean_abs=0.7738105058670044, max_abs=4.75, mean_rel=0.3094387352466583, max_rel=2812.499755859375, norm_rel=0.023526886478066444, ref_abs_avg=33.01679229736328, test_abs_avg=33.01323318481445
production_forward grad[41] vs paper_forward: mean_abs=0.6153030395507812, max_abs=2.5, mean_rel=0.11670680344104767, max_rel=5.89670991897583, norm_rel=0.023624354973435402, ref_abs_avg=25.18268585205078, test_abs_avg=25.198631286621094
production_forward grad[42] vs paper_forward: mean_abs=0.7817599773406982, max_abs=6.5, mean_rel=0.16124242544174194, max_rel=1529.216796875, norm_rel=0.02462269552052021, ref_abs_avg=31.831653594970703, test_abs_avg=31.83371925354004
production_forward grad[43] vs paper_forward: mean_abs=0.7311904430389404, max_abs=4.5, mean_rel=0.2979790270328522, max_rel=3656.249755859375, norm_rel=0.023067036643624306, ref_abs_avg=31.732929229736328, test_abs_avg=31.73493194580078
production_forward grad[44] vs paper_forward: mean_abs=0.5895910263061523, max_abs=2.75, mean_rel=0.08538729697465897, max_rel=4.211636543273926, norm_rel=0.024023691192269325, ref_abs_avg=24.55997085571289, test_abs_avg=24.58806610107422
production_forward grad[45] vs paper_forward: mean_abs=0.7489688396453857, max_abs=6.0, mean_rel=0.15901967883110046, max_rel=1081.2301025390625, norm_rel=0.024424521252512932, ref_abs_avg=30.75070571899414, test_abs_avg=30.75277328491211
production_forward grad[46] vs paper_forward: mean_abs=0.6950820088386536, max_abs=4.5, mean_rel=0.2289971113204956, max_rel=1687.4998779296875, norm_rel=0.02315494976937771, ref_abs_avg=30.123849868774414, test_abs_avg=30.120908737182617
production_forward grad[47] vs paper_forward: mean_abs=0.5657107830047607, max_abs=2.25, mean_rel=0.1016082614660263, max_rel=3.732009172439575, norm_rel=0.024096651002764702, ref_abs_avg=23.33643341064453, test_abs_avg=23.343013763427734
production_forward grad[48] vs paper_forward: mean_abs=0.7114270925521851, max_abs=6.0, mean_rel=0.1542668640613556, max_rel=1040.0006103515625, norm_rel=0.024274714291095734, ref_abs_avg=29.413192749023438, test_abs_avg=29.414085388183594
production_forward grad[49] vs paper_forward: mean_abs=0.6606453657150269, max_abs=4.625, mean_rel=0.24080640077590942, max_rel=2062.5, norm_rel=0.02254238724708557, ref_abs_avg=29.36162757873535, test_abs_avg=29.369338989257812
production_forward grad[50] vs paper_forward: mean_abs=0.6273584365844727, max_abs=2.4375, mean_rel=0.09150325506925583, max_rel=14.498266220092773, norm_rel=0.024370545521378517, ref_abs_avg=25.717899322509766, test_abs_avg=25.665300369262695
production_forward grad[51] vs paper_forward: mean_abs=0.7983205914497375, max_abs=5.5, mean_rel=0.17305578291416168, max_rel=1672.256103515625, norm_rel=0.02569781243801117, ref_abs_avg=31.173641204833984, test_abs_avg=31.176849365234375
production_forward grad[52] vs paper_forward: mean_abs=0.7347675561904907, max_abs=5.109375, mean_rel=0.26268404722213745, max_rel=1937.4998779296875, norm_rel=0.023998813703656197, ref_abs_avg=30.651351928710938, test_abs_avg=30.648380279541016
production_forward grad[53] vs paper_forward: mean_abs=0.5749092102050781, max_abs=2.03125, mean_rel=0.06121927127242088, max_rel=2.0676896572113037, norm_rel=0.022682061418890953, ref_abs_avg=25.332473754882812, test_abs_avg=25.333820343017578
production_forward grad[54] vs paper_forward: mean_abs=0.7402133345603943, max_abs=6.5, mean_rel=0.163090318441391, max_rel=620.0584106445312, norm_rel=0.025354428216814995, ref_abs_avg=29.237293243408203, test_abs_avg=29.23755645751953
production_forward grad[55] vs paper_forward: mean_abs=0.6893801689147949, max_abs=4.6875, mean_rel=0.24913078546524048, max_rel=2687.499755859375, norm_rel=0.02359846420586109, ref_abs_avg=29.1878662109375, test_abs_avg=29.198833465576172
production_forward grad[56] vs paper_forward: mean_abs=0.5175634622573853, max_abs=2.3125, mean_rel=0.17304876446723938, max_rel=36.045021057128906, norm_rel=0.023579267784953117, ref_abs_avg=22.020174026489258, test_abs_avg=21.98907470703125
production_forward grad[57] vs paper_forward: mean_abs=0.6844199895858765, max_abs=5.0, mean_rel=0.17218968272209167, max_rel=976.7609252929688, norm_rel=0.024826345965266228, ref_abs_avg=27.658794403076172, test_abs_avg=27.658313751220703
production_forward grad[58] vs paper_forward: mean_abs=0.6347280740737915, max_abs=5.0, mean_rel=0.2932750880718231, max_rel=1953.1248779296875, norm_rel=0.02343028225004673, ref_abs_avg=27.14288330078125, test_abs_avg=27.141338348388672
production_forward grad[59] vs paper_forward: mean_abs=0.50636887550354, max_abs=2.0, mean_rel=0.13348686695098877, max_rel=16.322559356689453, norm_rel=0.02360982820391655, ref_abs_avg=20.925762176513672, test_abs_avg=20.93067169189453
production_forward grad[60] vs paper_forward: mean_abs=0.6406911611557007, max_abs=4.5, mean_rel=0.15192893147468567, max_rel=1127.0201416015625, norm_rel=0.024567466229200363, ref_abs_avg=26.172117233276367, test_abs_avg=26.170520782470703
production_forward grad[61] vs paper_forward: mean_abs=0.5926836729049683, max_abs=4.0625, mean_rel=0.25638771057128906, max_rel=1921.8748779296875, norm_rel=0.02303297072649002, ref_abs_avg=25.738441467285156, test_abs_avg=25.734661102294922
production_forward grad[62] vs paper_forward: mean_abs=0.4483177661895752, max_abs=2.125, mean_rel=0.1225639283657074, max_rel=15.659429550170898, norm_rel=0.021967120468616486, ref_abs_avg=20.494718551635742, test_abs_avg=20.490089416503906
production_forward grad[63] vs paper_forward: mean_abs=0.6056735515594482, max_abs=4.5, mean_rel=0.14982831478118896, max_rel=826.67431640625, norm_rel=0.024214716628193855, ref_abs_avg=25.013874053955078, test_abs_avg=25.012866973876953
production_forward grad[64] vs paper_forward: mean_abs=0.5590997934341431, max_abs=3.875, mean_rel=0.2470184564590454, max_rel=2125.0, norm_rel=0.022382140159606934, ref_abs_avg=24.941043853759766, test_abs_avg=24.943828582763672
production_forward grad[65] vs paper_forward: mean_abs=0.4581540822982788, max_abs=1.625, mean_rel=0.07557525485754013, max_rel=4.740161418914795, norm_rel=0.02313198707997799, ref_abs_avg=19.94070816040039, test_abs_avg=19.926240921020508
production_forward grad[66] vs paper_forward: mean_abs=0.5724322199821472, max_abs=4.5, mean_rel=0.14336548745632172, max_rel=708.910888671875, norm_rel=0.023541631177067757, ref_abs_avg=24.32443618774414, test_abs_avg=24.3249454498291
production_forward grad[67] vs paper_forward: mean_abs=0.5228469371795654, max_abs=4.0, mean_rel=0.23024745285511017, max_rel=1624.9998779296875, norm_rel=0.02194327674806118, ref_abs_avg=23.79714584350586, test_abs_avg=23.802188873291016
production_forward grad[68] vs paper_forward: mean_abs=0.4573831558227539, max_abs=1.75, mean_rel=0.10284462571144104, max_rel=14.291117668151855, norm_rel=0.02340947836637497, ref_abs_avg=19.243696212768555, test_abs_avg=19.237743377685547
production_forward grad[69] vs paper_forward: mean_abs=0.5500643849372864, max_abs=6.0, mean_rel=0.14387032389640808, max_rel=544.8742065429688, norm_rel=0.02316650189459324, ref_abs_avg=23.71072006225586, test_abs_avg=23.711528778076172
production_forward grad[70] vs paper_forward: mean_abs=0.49281686544418335, max_abs=3.5, mean_rel=0.21503981947898865, max_rel=1031.25, norm_rel=0.02124876156449318, ref_abs_avg=23.15892791748047, test_abs_avg=23.160770416259766
production_forward grad[71] vs paper_forward: mean_abs=0.4095425605773926, max_abs=1.5, mean_rel=0.10431265085935593, max_rel=16.57823371887207, norm_rel=0.021882805973291397, ref_abs_avg=18.898651123046875, test_abs_avg=18.891422271728516
production_forward grad[72] vs paper_forward: mean_abs=0.5167454481124878, max_abs=4.875, mean_rel=0.1385786086320877, max_rel=504.10321044921875, norm_rel=0.02286747470498085, ref_abs_avg=22.619670867919922, test_abs_avg=22.619701385498047
production_forward grad[73] vs paper_forward: mean_abs=0.4689045548439026, max_abs=4.5625, mean_rel=0.21713118255138397, max_rel=1593.7498779296875, norm_rel=0.0209704227745533, ref_abs_avg=22.30689811706543, test_abs_avg=22.307331085205078
production_forward grad[74] vs paper_forward: mean_abs=0.43294715881347656, max_abs=1.625, mean_rel=0.09813482314348221, max_rel=8.571538925170898, norm_rel=0.02454446442425251, ref_abs_avg=18.335247039794922, test_abs_avg=18.330312728881836
production_forward grad[75] vs paper_forward: mean_abs=0.5520355701446533, max_abs=4.25, mean_rel=0.1593080759048462, max_rel=1411.5301513671875, norm_rel=0.024849355220794678, ref_abs_avg=22.265291213989258, test_abs_avg=22.265357971191406
production_forward grad[76] vs paper_forward: mean_abs=0.5072794556617737, max_abs=3.9375, mean_rel=0.21663737297058105, max_rel=1718.7498779296875, norm_rel=0.0229216106235981, ref_abs_avg=22.141754150390625, test_abs_avg=22.14522361755371
production_forward grad[77] vs paper_forward: mean_abs=0.3964576721191406, max_abs=1.46484375, mean_rel=0.09210419654846191, max_rel=6.611316204071045, norm_rel=0.02263091877102852, ref_abs_avg=17.57805824279785, test_abs_avg=17.58846092224121
production_forward grad[78] vs paper_forward: mean_abs=0.5138413906097412, max_abs=4.5, mean_rel=0.16260787844657898, max_rel=1213.9764404296875, norm_rel=0.024072112515568733, ref_abs_avg=21.350357055664062, test_abs_avg=21.350292205810547
production_forward grad[79] vs paper_forward: mean_abs=0.4672657251358032, max_abs=3.25, mean_rel=0.23455612361431122, max_rel=1499.9998779296875, norm_rel=0.022538302466273308, ref_abs_avg=20.67032241821289, test_abs_avg=20.68142318725586
production_forward grad[80] vs paper_forward: mean_abs=0.37729722261428833, max_abs=1.65625, mean_rel=0.30329033732414246, max_rel=108.20767211914062, norm_rel=0.02272622473537922, ref_abs_avg=16.57541275024414, test_abs_avg=16.5638427734375
production_forward grad[81] vs paper_forward: mean_abs=0.47277066111564636, max_abs=4.25, mean_rel=0.14824265241622925, max_rel=822.6216430664062, norm_rel=0.023635024204850197, ref_abs_avg=20.048946380615234, test_abs_avg=20.049461364746094
production_forward grad[82] vs paper_forward: mean_abs=0.4363417327404022, max_abs=3.375, mean_rel=0.21180835366249084, max_rel=1624.9998779296875, norm_rel=0.022173341363668442, ref_abs_avg=19.752817153930664, test_abs_avg=19.75592803955078
production_forward grad[83] vs paper_forward: mean_abs=0.3536638021469116, max_abs=1.25, mean_rel=0.0826050266623497, max_rel=6.287319183349609, norm_rel=0.022137325257062912, ref_abs_avg=15.970255851745605, test_abs_avg=15.961563110351562
production_forward grad[84] vs paper_forward: mean_abs=0.4475729465484619, max_abs=5.0, mean_rel=0.14148889482021332, max_rel=755.408935546875, norm_rel=0.02318257838487625, ref_abs_avg=19.36844825744629, test_abs_avg=19.369081497192383
production_forward grad[85] vs paper_forward: mean_abs=0.40526023507118225, max_abs=3.5, mean_rel=0.1904873549938202, max_rel=968.7499389648438, norm_rel=0.021372536197304726, ref_abs_avg=19.009563446044922, test_abs_avg=19.009443283081055
production_forward grad[86] vs paper_forward: mean_abs=0.3076671361923218, max_abs=1.125, mean_rel=0.11395865678787231, max_rel=15.815652847290039, norm_rel=0.01993691734969616, ref_abs_avg=15.638330459594727, test_abs_avg=15.629992485046387
production_forward grad[87] vs paper_forward: mean_abs=0.4131920635700226, max_abs=5.0, mean_rel=0.14286580681800842, max_rel=1812.9461669921875, norm_rel=0.022522971034049988, ref_abs_avg=18.4451961517334, test_abs_avg=18.446149826049805
production_forward grad[88] vs paper_forward: mean_abs=0.3809279799461365, max_abs=2.75, mean_rel=0.20374979078769684, max_rel=1562.4998779296875, norm_rel=0.020693548023700714, ref_abs_avg=18.477798461914062, test_abs_avg=18.475170135498047
production_forward grad[89] vs paper_forward: mean_abs=0.3065621852874756, max_abs=1.375, mean_rel=0.08778079599142075, max_rel=10.30885124206543, norm_rel=0.020158259198069572, ref_abs_avg=15.530938148498535, test_abs_avg=15.541886329650879
production_forward grad[90] vs paper_forward: mean_abs=0.39615994691848755, max_abs=5.0, mean_rel=0.13580557703971863, max_rel=892.9929809570312, norm_rel=0.02213926427066326, ref_abs_avg=18.061786651611328, test_abs_avg=18.06292724609375
production_forward grad[91] vs paper_forward: mean_abs=0.36202576756477356, max_abs=3.5, mean_rel=0.16590747237205505, max_rel=1531.2498779296875, norm_rel=0.020302969962358475, ref_abs_avg=17.920867919921875, test_abs_avg=17.922698974609375
production_forward grad[92] vs paper_forward: mean_abs=0.28250694274902344, max_abs=1.125, mean_rel=0.10377135872840881, max_rel=18.424165725708008, norm_rel=0.020879477262496948, ref_abs_avg=13.803266525268555, test_abs_avg=13.806119918823242
production_forward grad[93] vs paper_forward: mean_abs=0.3718219995498657, max_abs=4.0, mean_rel=0.1323966085910797, max_rel=592.7666625976562, norm_rel=0.02193814143538475, ref_abs_avg=17.142784118652344, test_abs_avg=17.143009185791016
production_forward grad[94] vs paper_forward: mean_abs=0.3445259928703308, max_abs=3.34375, mean_rel=0.1970134675502777, max_rel=1843.7498779296875, norm_rel=0.02055184915661812, ref_abs_avg=17.063758850097656, test_abs_avg=17.060325622558594
production_forward grad[95] vs paper_forward: mean_abs=0.29377782344818115, max_abs=1.3125, mean_rel=0.10904556512832642, max_rel=23.159521102905273, norm_rel=0.02062632143497467, ref_abs_avg=14.096046447753906, test_abs_avg=14.106295585632324
production_forward grad[96] vs paper_forward: mean_abs=0.3538498878479004, max_abs=4.453125, mean_rel=0.11901307851076126, max_rel=367.0726623535156, norm_rel=0.021177014335989952, ref_abs_avg=16.993671417236328, test_abs_avg=16.992385864257812
production_forward grad[97] vs paper_forward: mean_abs=0.32229334115982056, max_abs=3.0625, mean_rel=0.1782553493976593, max_rel=1515.6248779296875, norm_rel=0.01936432160437107, ref_abs_avg=16.94099998474121, test_abs_avg=16.937393188476562
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016270395135506988, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00865190476179123, max_abs=0.390625, mean_rel=0.0743514746427536, max_rel=105.6002426147461, norm_rel=0.020357457920908928, ref_abs_avg=0.45904994010925293, test_abs_avg=0.459061861038208
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.299896717071533, max_abs=56.0, mean_rel=0.26806336641311646, max_rel=1747.7435302734375, norm_rel=0.020475000143051147, ref_abs_avg=314.73284912109375, test_abs_avg=314.6832580566406
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2504682540893555, max_abs=4.625, mean_rel=0.0914938896894455, max_rel=7.129383087158203, norm_rel=0.02340591885149479, ref_abs_avg=53.657623291015625, test_abs_avg=53.650726318359375
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.657029628753662, max_abs=11.0, mean_rel=0.17627492547035217, max_rel=1620.0592041015625, norm_rel=0.02491259016096592, ref_abs_avg=66.95124053955078, test_abs_avg=66.95380401611328
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.538065791130066, max_abs=10.0, mean_rel=0.3896900415420532, max_rel=4687.5, norm_rel=0.02335197478532791, ref_abs_avg=66.2353515625, test_abs_avg=66.24301147460938
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1378626823425293, max_abs=4.5, mean_rel=0.10177019983530045, max_rel=8.94729995727539, norm_rel=0.02192823402583599, ref_abs_avg=50.77986526489258, test_abs_avg=50.73611068725586
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.440704107284546, max_abs=9.5, mean_rel=0.1640051305294037, max_rel=2121.04541015625, norm_rel=0.024600673466920853, ref_abs_avg=58.918357849121094, test_abs_avg=58.92240905761719
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3327364921569824, max_abs=8.0, mean_rel=0.4272506833076477, max_rel=3874.999755859375, norm_rel=0.022991498932242393, ref_abs_avg=58.356788635253906, test_abs_avg=58.35868835449219
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.004709243774414, max_abs=4.0, mean_rel=0.09496498852968216, max_rel=5.450552463531494, norm_rel=0.023220067843794823, ref_abs_avg=44.139198303222656, test_abs_avg=44.150787353515625
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3087489604949951, max_abs=10.0, mean_rel=0.16190242767333984, max_rel=1673.071533203125, norm_rel=0.024471987038850784, ref_abs_avg=53.80351638793945, test_abs_avg=53.806663513183594
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.212805986404419, max_abs=7.5, mean_rel=0.35771358013153076, max_rel=4375.0, norm_rel=0.022898847237229347, ref_abs_avg=53.212928771972656, test_abs_avg=53.22113800048828
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9158234596252441, max_abs=3.5, mean_rel=0.12492238730192184, max_rel=11.368903160095215, norm_rel=0.023197371512651443, ref_abs_avg=39.82054901123047, test_abs_avg=39.72710418701172
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2115132808685303, max_abs=8.0, mean_rel=0.1602126657962799, max_rel=2407.8720703125, norm_rel=0.024199998006224632, ref_abs_avg=50.37930679321289, test_abs_avg=50.38223648071289
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1124752759933472, max_abs=7.0, mean_rel=0.3386044204235077, max_rel=5999.99951171875, norm_rel=0.022565539926290512, ref_abs_avg=49.530914306640625, test_abs_avg=49.53601837158203
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8937606811523438, max_abs=3.25, mean_rel=0.09595981240272522, max_rel=8.297457695007324, norm_rel=0.023999474942684174, ref_abs_avg=37.634517669677734, test_abs_avg=37.719722747802734
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1193037033081055, max_abs=8.0, mean_rel=0.15356385707855225, max_rel=1757.2791748046875, norm_rel=0.024098578840494156, ref_abs_avg=46.71833801269531, test_abs_avg=46.71997833251953
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0323961973190308, max_abs=6.125, mean_rel=0.30966779589653015, max_rel=4062.499755859375, norm_rel=0.022382160648703575, ref_abs_avg=46.41313934326172, test_abs_avg=46.41637420654297
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8399839401245117, max_abs=3.625, mean_rel=0.09669949859380722, max_rel=7.470174789428711, norm_rel=0.02377256564795971, ref_abs_avg=35.599334716796875, test_abs_avg=35.61008071899414
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0603753328323364, max_abs=7.0, mean_rel=0.14366930723190308, max_rel=631.7823486328125, norm_rel=0.023860333487391472, ref_abs_avg=44.66448211669922, test_abs_avg=44.667091369628906
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9709417819976807, max_abs=6.0, mean_rel=0.28377893567085266, max_rel=2781.249755859375, norm_rel=0.02237747423350811, ref_abs_avg=43.53199768066406, test_abs_avg=43.53695297241211
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7951688766479492, max_abs=3.0, mean_rel=0.09973739832639694, max_rel=12.893811225891113, norm_rel=0.022248366847634315, ref_abs_avg=35.98912048339844, test_abs_avg=36.008872985839844
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0016865730285645, max_abs=8.0, mean_rel=0.15467950701713562, max_rel=1025.2769775390625, norm_rel=0.023771485313773155, ref_abs_avg=42.420265197753906, test_abs_avg=42.420169830322266
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9217160940170288, max_abs=6.5, mean_rel=0.30033397674560547, max_rel=2624.999755859375, norm_rel=0.022180955857038498, ref_abs_avg=41.7896728515625, test_abs_avg=41.79193115234375
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7591733932495117, max_abs=2.75, mean_rel=0.07603225111961365, max_rel=4.426670074462891, norm_rel=0.023041116073727608, ref_abs_avg=33.269832611083984, test_abs_avg=33.25273132324219
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9508049488067627, max_abs=6.5, mean_rel=0.15955084562301636, max_rel=1787.65625, norm_rel=0.02357235737144947, ref_abs_avg=40.59735107421875, test_abs_avg=40.599700927734375
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8741143941879272, max_abs=5.25, mean_rel=0.22954532504081726, max_rel=2093.75, norm_rel=0.021849751472473145, ref_abs_avg=40.18357849121094, test_abs_avg=40.18071365356445
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8295412063598633, max_abs=3.125, mean_rel=0.1499406397342682, max_rel=19.488880157470703, norm_rel=0.023694580420851707, ref_abs_avg=35.70734405517578, test_abs_avg=35.71991729736328
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0986590385437012, max_abs=7.5, mean_rel=0.1790916621685028, max_rel=1679.6583251953125, norm_rel=0.025517072528600693, ref_abs_avg=43.27574920654297, test_abs_avg=43.27721405029297
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0238251686096191, max_abs=7.25, mean_rel=0.2914804220199585, max_rel=2312.5, norm_rel=0.024018941447138786, ref_abs_avg=42.773773193359375, test_abs_avg=42.77562713623047
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8076915740966797, max_abs=3.0, mean_rel=0.11509981006383896, max_rel=6.803074359893799, norm_rel=0.02491665817797184, ref_abs_avg=32.54298400878906, test_abs_avg=32.57096481323242
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0196106433868408, max_abs=6.5, mean_rel=0.17664246261119843, max_rel=1481.516357421875, norm_rel=0.025898033753037453, ref_abs_avg=39.6002082824707, test_abs_avg=39.60191345214844
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9504404664039612, max_abs=5.5, mean_rel=0.3340510129928589, max_rel=2437.5, norm_rel=0.024318572133779526, ref_abs_avg=39.19947052001953, test_abs_avg=39.201969146728516
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7321915626525879, max_abs=2.84375, mean_rel=0.14076225459575653, max_rel=23.894062042236328, norm_rel=0.024936484172940254, ref_abs_avg=30.383373260498047, test_abs_avg=30.409337997436523
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9490004181861877, max_abs=6.0, mean_rel=0.17699381709098816, max_rel=2967.12255859375, norm_rel=0.02565844915807247, ref_abs_avg=37.12568283081055, test_abs_avg=37.12821960449219
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8947137594223022, max_abs=5.5, mean_rel=0.3605448603630066, max_rel=3312.499755859375, norm_rel=0.02460547350347042, ref_abs_avg=36.418251037597656, test_abs_avg=36.42029571533203
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7140359878540039, max_abs=3.25, mean_rel=0.2527521550655365, max_rel=71.4915542602539, norm_rel=0.025188207626342773, ref_abs_avg=28.30740737915039, test_abs_avg=28.26030921936035
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8909922242164612, max_abs=6.0, mean_rel=0.1714497059583664, max_rel=2155.989013671875, norm_rel=0.025530708953738213, ref_abs_avg=35.04063415527344, test_abs_avg=35.040252685546875
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8324083089828491, max_abs=4.84375, mean_rel=0.24693357944488525, max_rel=2015.6248779296875, norm_rel=0.024170542135834694, ref_abs_avg=34.50562286376953, test_abs_avg=34.51345443725586
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6215202808380127, max_abs=2.75, mean_rel=0.3391070067882538, max_rel=135.00115966796875, norm_rel=0.023076286539435387, ref_abs_avg=27.21369171142578, test_abs_avg=27.24885368347168
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8394315838813782, max_abs=6.0, mean_rel=0.16075468063354492, max_rel=1001.5881958007812, norm_rel=0.025199614465236664, ref_abs_avg=33.41119384765625, test_abs_avg=33.413047790527344
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7897050380706787, max_abs=4.75, mean_rel=0.30934005975723267, max_rel=2468.75, norm_rel=0.02398918755352497, ref_abs_avg=33.01679229736328, test_abs_avg=33.012664794921875
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6317071914672852, max_abs=2.75, mean_rel=0.10289084911346436, max_rel=5.034790515899658, norm_rel=0.024025948718190193, ref_abs_avg=25.18268585205078, test_abs_avg=25.18648910522461
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.792635440826416, max_abs=6.0, mean_rel=0.1621105670928955, max_rel=817.5575561523438, norm_rel=0.024966614320874214, ref_abs_avg=31.831653594970703, test_abs_avg=31.833219528198242
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7428794503211975, max_abs=4.75, mean_rel=0.2974417805671692, max_rel=3281.249755859375, norm_rel=0.023430686444044113, ref_abs_avg=31.732929229736328, test_abs_avg=31.728918075561523
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5852687358856201, max_abs=2.25, mean_rel=0.08672738075256348, max_rel=5.486937999725342, norm_rel=0.023895740509033203, ref_abs_avg=24.55997085571289, test_abs_avg=24.56447982788086
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7587150931358337, max_abs=5.0, mean_rel=0.16471630334854126, max_rel=1441.539794921875, norm_rel=0.02473919838666916, ref_abs_avg=30.75070571899414, test_abs_avg=30.751779556274414
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7062801122665405, max_abs=4.5, mean_rel=0.24889452755451202, max_rel=1906.2498779296875, norm_rel=0.023522712290287018, ref_abs_avg=30.123849868774414, test_abs_avg=30.12006378173828
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5856449604034424, max_abs=2.0, mean_rel=0.09608136862516403, max_rel=3.8715169429779053, norm_rel=0.024566415697336197, ref_abs_avg=23.33643341064453, test_abs_avg=23.350788116455078
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7205353379249573, max_abs=5.0, mean_rel=0.15836571156978607, max_rel=1020.8510131835938, norm_rel=0.024585889652371407, ref_abs_avg=29.413192749023438, test_abs_avg=29.414104461669922
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6720161437988281, max_abs=4.5, mean_rel=0.2452745884656906, max_rel=2125.0, norm_rel=0.022918708622455597, ref_abs_avg=29.36162757873535, test_abs_avg=29.3682804107666
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6248836517333984, max_abs=2.75, mean_rel=0.08883970975875854, max_rel=11.596843719482422, norm_rel=0.024646209552884102, ref_abs_avg=25.717899322509766, test_abs_avg=25.65979766845703
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8097708225250244, max_abs=6.0, mean_rel=0.17878668010234833, max_rel=1933.3104248046875, norm_rel=0.02606690488755703, ref_abs_avg=31.173641204833984, test_abs_avg=31.173023223876953
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7500927448272705, max_abs=4.25, mean_rel=0.27842170000076294, max_rel=1765.6248779296875, norm_rel=0.024493204429745674, ref_abs_avg=30.651351928710938, test_abs_avg=30.648527145385742
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5850539207458496, max_abs=2.375, mean_rel=0.061205871403217316, max_rel=4.176188945770264, norm_rel=0.02326490730047226, ref_abs_avg=25.332473754882812, test_abs_avg=25.328222274780273
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7494893074035645, max_abs=6.0, mean_rel=0.16291159391403198, max_rel=830.2885131835938, norm_rel=0.025669172406196594, ref_abs_avg=29.237293243408203, test_abs_avg=29.23794174194336
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.698430061340332, max_abs=4.625, mean_rel=0.24295347929000854, max_rel=2437.5, norm_rel=0.023900821805000305, ref_abs_avg=29.1878662109375, test_abs_avg=29.195770263671875
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.537490725517273, max_abs=2.90625, mean_rel=0.14656993746757507, max_rel=15.571151733398438, norm_rel=0.024439146742224693, ref_abs_avg=22.020174026489258, test_abs_avg=21.998937606811523
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.693641722202301, max_abs=6.0, mean_rel=0.17418518662452698, max_rel=872.8128662109375, norm_rel=0.025130709633231163, ref_abs_avg=27.658794403076172, test_abs_avg=27.657608032226562
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6426241993904114, max_abs=4.625, mean_rel=0.29708755016326904, max_rel=1968.7498779296875, norm_rel=0.02371104247868061, ref_abs_avg=27.14288330078125, test_abs_avg=27.14259147644043
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5134928226470947, max_abs=1.75, mean_rel=0.142677903175354, max_rel=16.33245086669922, norm_rel=0.024313002824783325, ref_abs_avg=20.925762176513672, test_abs_avg=20.9377384185791
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6479251980781555, max_abs=4.5, mean_rel=0.15535928308963776, max_rel=1141.189697265625, norm_rel=0.02481594868004322, ref_abs_avg=26.172117233276367, test_abs_avg=26.17188262939453
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6014389991760254, max_abs=4.3125, mean_rel=0.26020753383636475, max_rel=1499.9998779296875, norm_rel=0.023384060710668564, ref_abs_avg=25.738441467285156, test_abs_avg=25.735015869140625
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4634087085723877, max_abs=2.25, mean_rel=0.1366235464811325, max_rel=12.231698036193848, norm_rel=0.022973280400037766, ref_abs_avg=20.494718551635742, test_abs_avg=20.478858947753906
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6121716499328613, max_abs=6.0, mean_rel=0.15251997113227844, max_rel=1039.927734375, norm_rel=0.024466753005981445, ref_abs_avg=25.013874053955078, test_abs_avg=25.012584686279297
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5639790296554565, max_abs=3.75, mean_rel=0.24220219254493713, max_rel=1749.9998779296875, norm_rel=0.022592371329665184, ref_abs_avg=24.941043853759766, test_abs_avg=24.944412231445312
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4534780979156494, max_abs=1.8125, mean_rel=0.07941481471061707, max_rel=2.288123369216919, norm_rel=0.023111632093787193, ref_abs_avg=19.94070816040039, test_abs_avg=19.926570892333984
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5777039527893066, max_abs=4.875, mean_rel=0.14718016982078552, max_rel=1425.16259765625, norm_rel=0.023757316172122955, ref_abs_avg=24.32443618774414, test_abs_avg=24.325166702270508
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5300771594047546, max_abs=3.84375, mean_rel=0.23079006373882294, max_rel=1570.3123779296875, norm_rel=0.02223040908575058, ref_abs_avg=23.79714584350586, test_abs_avg=23.79825210571289
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4567899703979492, max_abs=1.875, mean_rel=0.09340253472328186, max_rel=12.922340393066406, norm_rel=0.023519521579146385, ref_abs_avg=19.243696212768555, test_abs_avg=19.230934143066406
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5541373491287231, max_abs=5.0, mean_rel=0.148302361369133, max_rel=683.1936645507812, norm_rel=0.023320522159337997, ref_abs_avg=23.71072006225586, test_abs_avg=23.71071434020996
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5003555417060852, max_abs=3.5, mean_rel=0.21661275625228882, max_rel=1374.9998779296875, norm_rel=0.02160368300974369, ref_abs_avg=23.15892791748047, test_abs_avg=23.161531448364258
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3968348503112793, max_abs=1.5, mean_rel=0.09776750206947327, max_rel=13.573921203613281, norm_rel=0.020997144281864166, ref_abs_avg=18.898651123046875, test_abs_avg=18.901641845703125
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.519882321357727, max_abs=4.5, mean_rel=0.142215758562088, max_rel=621.274169921875, norm_rel=0.023012610152363777, ref_abs_avg=22.619670867919922, test_abs_avg=22.619674682617188
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.47159498929977417, max_abs=4.375, mean_rel=0.2180965542793274, max_rel=2140.625, norm_rel=0.021093927323818207, ref_abs_avg=22.30689811706543, test_abs_avg=22.30646514892578
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.44945526123046875, max_abs=1.75, mean_rel=0.09512637555599213, max_rel=8.571538925170898, norm_rel=0.02491731196641922, ref_abs_avg=18.335247039794922, test_abs_avg=18.33913803100586
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5576828718185425, max_abs=5.0, mean_rel=0.16216465830802917, max_rel=1074.5877685546875, norm_rel=0.025099579244852066, ref_abs_avg=22.265291213989258, test_abs_avg=22.267118453979492
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5136623382568359, max_abs=4.0, mean_rel=0.2109692394733429, max_rel=1656.2498779296875, norm_rel=0.023197941482067108, ref_abs_avg=22.141754150390625, test_abs_avg=22.146358489990234
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.39531826972961426, max_abs=1.5, mean_rel=0.08648855984210968, max_rel=2.8187782764434814, norm_rel=0.022333981469273567, ref_abs_avg=17.57805824279785, test_abs_avg=17.580642700195312
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5193755030632019, max_abs=5.25, mean_rel=0.1652829945087433, max_rel=1044.635498046875, norm_rel=0.024321231991052628, ref_abs_avg=21.350357055664062, test_abs_avg=21.35021209716797
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.472980260848999, max_abs=4.0, mean_rel=0.2333572953939438, max_rel=1406.2498779296875, norm_rel=0.022841067984700203, ref_abs_avg=20.67032241821289, test_abs_avg=20.677997589111328
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.37484413385391235, max_abs=1.42578125, mean_rel=0.13799777626991272, max_rel=15.065293312072754, norm_rel=0.022572634741663933, ref_abs_avg=16.57541275024414, test_abs_avg=16.56696891784668
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.47746679186820984, max_abs=5.0, mean_rel=0.1482815146446228, max_rel=1086.7852783203125, norm_rel=0.023855092003941536, ref_abs_avg=20.048946380615234, test_abs_avg=20.050159454345703
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4443745017051697, max_abs=3.5, mean_rel=0.21119682490825653, max_rel=1499.9998779296875, norm_rel=0.022588547319173813, ref_abs_avg=19.752817153930664, test_abs_avg=19.755666732788086
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3520662784576416, max_abs=1.375, mean_rel=0.08906983584165573, max_rel=10.942718505859375, norm_rel=0.022032802924513817, ref_abs_avg=15.970255851745605, test_abs_avg=15.958902359008789
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4515937566757202, max_abs=5.0, mean_rel=0.14114780724048615, max_rel=897.2989501953125, norm_rel=0.023375065997242928, ref_abs_avg=19.36844825744629, test_abs_avg=19.369556427001953
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4095819890499115, max_abs=3.375, mean_rel=0.19469043612480164, max_rel=1437.4998779296875, norm_rel=0.02161499857902527, ref_abs_avg=19.009563446044922, test_abs_avg=19.011871337890625
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.31844234466552734, max_abs=1.125, mean_rel=0.10580817610025406, max_rel=9.64224624633789, norm_rel=0.020634055137634277, ref_abs_avg=15.638330459594727, test_abs_avg=15.622223854064941
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4153578281402588, max_abs=4.0, mean_rel=0.1435479074716568, max_rel=1761.7724609375, norm_rel=0.02263534627854824, ref_abs_avg=18.4451961517334, test_abs_avg=18.44660186767578
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3831756114959717, max_abs=3.25, mean_rel=0.19961446523666382, max_rel=1499.9998779296875, norm_rel=0.020782671868801117, ref_abs_avg=18.477798461914062, test_abs_avg=18.47829818725586
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.30625176429748535, max_abs=1.3125, mean_rel=0.09056811779737473, max_rel=10.014181137084961, norm_rel=0.02022790163755417, ref_abs_avg=15.530938148498535, test_abs_avg=15.525154113769531
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.39795148372650146, max_abs=4.0, mean_rel=0.1362844556570053, max_rel=661.1907958984375, norm_rel=0.022217171266674995, ref_abs_avg=18.061786651611328, test_abs_avg=18.06348419189453
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3655243515968323, max_abs=3.5, mean_rel=0.17310220003128052, max_rel=1656.2498779296875, norm_rel=0.020566871389746666, ref_abs_avg=17.920867919921875, test_abs_avg=17.923694610595703
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2829437255859375, max_abs=1.1875, mean_rel=0.08862204849720001, max_rel=13.019368171691895, norm_rel=0.020915815606713295, ref_abs_avg=13.803266525268555, test_abs_avg=13.81741714477539
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.37264132499694824, max_abs=5.0, mean_rel=0.13122664391994476, max_rel=628.721435546875, norm_rel=0.02198680303990841, ref_abs_avg=17.142784118652344, test_abs_avg=17.143383026123047
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3437475562095642, max_abs=3.75, mean_rel=0.18901899456977844, max_rel=1374.9998779296875, norm_rel=0.020505094900727272, ref_abs_avg=17.063758850097656, test_abs_avg=17.066158294677734
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2918749451637268, max_abs=1.375, mean_rel=0.09743759781122208, max_rel=13.742169380187988, norm_rel=0.020838826894760132, ref_abs_avg=14.096046447753906, test_abs_avg=14.10653305053711
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.35459908843040466, max_abs=4.5, mean_rel=0.12234190851449966, max_rel=493.0501403808594, norm_rel=0.02123439498245716, ref_abs_avg=16.993671417236328, test_abs_avg=16.992847442626953
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32016709446907043, max_abs=3.0, mean_rel=0.17150357365608215, max_rel=1265.625, norm_rel=0.0192300733178854, ref_abs_avg=16.94099998474121, test_abs_avg=16.93467140197754
production_forward2 vs paper_forward output: mean_abs=0.0016249200562015176, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008306545205414295, max_abs=0.37109375, mean_rel=0.07174129784107208, max_rel=143.74325561523438, norm_rel=0.01966191455721855, ref_abs_avg=0.45904994010925293, test_abs_avg=0.4590739607810974
production_forward2 grad[1] vs paper_forward: mean_abs=7.172539710998535, max_abs=52.0, mean_rel=0.23086392879486084, max_rel=1232.5377197265625, norm_rel=0.02011680044233799, ref_abs_avg=314.73284912109375, test_abs_avg=314.72113037109375
production_forward2 grad[2] vs paper_forward: mean_abs=1.2512116432189941, max_abs=4.5, mean_rel=0.09670110791921616, max_rel=12.483355522155762, norm_rel=0.02360294759273529, ref_abs_avg=53.657623291015625, test_abs_avg=53.58723831176758
production_forward2 grad[3] vs paper_forward: mean_abs=1.609321117401123, max_abs=12.0, mean_rel=0.17265847325325012, max_rel=2893.66748046875, norm_rel=0.024210026487708092, ref_abs_avg=66.95124053955078, test_abs_avg=66.95579528808594
production_forward2 grad[4] vs paper_forward: mean_abs=1.4881222248077393, max_abs=9.0, mean_rel=0.40187299251556396, max_rel=4437.5, norm_rel=0.022610699757933617, ref_abs_avg=66.2353515625, test_abs_avg=66.2419662475586
production_forward2 grad[5] vs paper_forward: mean_abs=1.095576286315918, max_abs=4.375, mean_rel=0.12355822324752808, max_rel=20.909807205200195, norm_rel=0.021551137790083885, ref_abs_avg=50.77986526489258, test_abs_avg=50.70476531982422
production_forward2 grad[6] vs paper_forward: mean_abs=1.3977489471435547, max_abs=9.0, mean_rel=0.16459617018699646, max_rel=2655.027587890625, norm_rel=0.023900270462036133, ref_abs_avg=58.918357849121094, test_abs_avg=58.922279357910156
production_forward2 grad[7] vs paper_forward: mean_abs=1.2849594354629517, max_abs=7.5, mean_rel=0.38772937655448914, max_rel=3374.999755859375, norm_rel=0.022171499207615852, ref_abs_avg=58.356788635253906, test_abs_avg=58.360843658447266
production_forward2 grad[8] vs paper_forward: mean_abs=1.0180931091308594, max_abs=4.375, mean_rel=0.09521771967411041, max_rel=8.896187782287598, norm_rel=0.02359381690621376, ref_abs_avg=44.139198303222656, test_abs_avg=44.12858200073242
production_forward2 grad[9] vs paper_forward: mean_abs=1.2720451354980469, max_abs=9.0, mean_rel=0.15708285570144653, max_rel=1719.86376953125, norm_rel=0.02381480485200882, ref_abs_avg=53.80351638793945, test_abs_avg=53.80965042114258
production_forward2 grad[10] vs paper_forward: mean_abs=1.1742055416107178, max_abs=7.0, mean_rel=0.32917582988739014, max_rel=3312.499755859375, norm_rel=0.02216903492808342, ref_abs_avg=53.212928771972656, test_abs_avg=53.225181579589844
production_forward2 grad[11] vs paper_forward: mean_abs=0.9403972625732422, max_abs=4.0, mean_rel=0.18043124675750732, max_rel=28.921911239624023, norm_rel=0.023637570440769196, ref_abs_avg=39.82054901123047, test_abs_avg=39.72458267211914
production_forward2 grad[12] vs paper_forward: mean_abs=1.1800508499145508, max_abs=9.0, mean_rel=0.15277567505836487, max_rel=1199.2994384765625, norm_rel=0.023595118895173073, ref_abs_avg=50.37930679321289, test_abs_avg=50.38459777832031
production_forward2 grad[13] vs paper_forward: mean_abs=1.0800089836120605, max_abs=6.75, mean_rel=0.31774720549583435, max_rel=3874.999755859375, norm_rel=0.021917227655649185, ref_abs_avg=49.530914306640625, test_abs_avg=49.53687286376953
production_forward2 grad[14] vs paper_forward: mean_abs=0.8394813537597656, max_abs=3.75, mean_rel=0.08390162140130997, max_rel=7.075746536254883, norm_rel=0.02257753349840641, ref_abs_avg=37.634517669677734, test_abs_avg=37.72545623779297
production_forward2 grad[15] vs paper_forward: mean_abs=1.0897717475891113, max_abs=8.0, mean_rel=0.14534097909927368, max_rel=961.0870361328125, norm_rel=0.023493692278862, ref_abs_avg=46.71833801269531, test_abs_avg=46.722267150878906
production_forward2 grad[16] vs paper_forward: mean_abs=1.004586935043335, max_abs=6.0, mean_rel=0.2934938371181488, max_rel=3656.249755859375, norm_rel=0.02177765592932701, ref_abs_avg=46.41313934326172, test_abs_avg=46.416748046875
production_forward2 grad[17] vs paper_forward: mean_abs=0.8279609680175781, max_abs=3.0, mean_rel=0.07959480583667755, max_rel=6.260652542114258, norm_rel=0.02348591387271881, ref_abs_avg=35.599334716796875, test_abs_avg=35.58278274536133
production_forward2 grad[18] vs paper_forward: mean_abs=1.034170150756836, max_abs=7.375, mean_rel=0.14192867279052734, max_rel=579.8646240234375, norm_rel=0.02329239249229431, ref_abs_avg=44.66448211669922, test_abs_avg=44.667633056640625
production_forward2 grad[19] vs paper_forward: mean_abs=0.9456376433372498, max_abs=5.59375, mean_rel=0.27580827474594116, max_rel=2218.75, norm_rel=0.0218039657920599, ref_abs_avg=43.53199768066406, test_abs_avg=43.53443145751953
production_forward2 grad[20] vs paper_forward: mean_abs=0.7743062973022461, max_abs=3.25, mean_rel=0.09226440638303757, max_rel=10.454981803894043, norm_rel=0.02213202603161335, ref_abs_avg=35.98912048339844, test_abs_avg=35.99641418457031
production_forward2 grad[21] vs paper_forward: mean_abs=0.9808015823364258, max_abs=7.0, mean_rel=0.15404967963695526, max_rel=953.7110595703125, norm_rel=0.02326895296573639, ref_abs_avg=42.420265197753906, test_abs_avg=42.424560546875
production_forward2 grad[22] vs paper_forward: mean_abs=0.9004380702972412, max_abs=6.125, mean_rel=0.3000533878803253, max_rel=3124.999755859375, norm_rel=0.021656405180692673, ref_abs_avg=41.7896728515625, test_abs_avg=41.79241180419922
production_forward2 grad[23] vs paper_forward: mean_abs=0.7263975143432617, max_abs=2.75, mean_rel=0.06337577104568481, max_rel=1.9637550115585327, norm_rel=0.021860092878341675, ref_abs_avg=33.269832611083984, test_abs_avg=33.25592041015625
production_forward2 grad[24] vs paper_forward: mean_abs=0.93125319480896, max_abs=8.0, mean_rel=0.16003422439098358, max_rel=1757.6077880859375, norm_rel=0.023108042776584625, ref_abs_avg=40.59735107421875, test_abs_avg=40.59941482543945
production_forward2 grad[25] vs paper_forward: mean_abs=0.8551899194717407, max_abs=5.0, mean_rel=0.23747007548809052, max_rel=3156.249755859375, norm_rel=0.021395450457930565, ref_abs_avg=40.18357849121094, test_abs_avg=40.183387756347656
production_forward2 grad[26] vs paper_forward: mean_abs=0.8004405498504639, max_abs=4.4375, mean_rel=0.11788035929203033, max_rel=14.338301658630371, norm_rel=0.02302624098956585, ref_abs_avg=35.70734405517578, test_abs_avg=35.71240234375
production_forward2 grad[27] vs paper_forward: mean_abs=1.0741682052612305, max_abs=10.0, mean_rel=0.17397567629814148, max_rel=1665.4246826171875, norm_rel=0.024959323927760124, ref_abs_avg=43.27574920654297, test_abs_avg=43.279144287109375
production_forward2 grad[28] vs paper_forward: mean_abs=0.992914080619812, max_abs=6.90625, mean_rel=0.2985950708389282, max_rel=2906.249755859375, norm_rel=0.023293130099773407, ref_abs_avg=42.773773193359375, test_abs_avg=42.77903366088867
production_forward2 grad[29] vs paper_forward: mean_abs=0.7986912727355957, max_abs=3.125, mean_rel=0.1331602782011032, max_rel=14.904444694519043, norm_rel=0.024667538702487946, ref_abs_avg=32.54298400878906, test_abs_avg=32.55059051513672
production_forward2 grad[30] vs paper_forward: mean_abs=0.9995612502098083, max_abs=7.0, mean_rel=0.17220503091812134, max_rel=1576.409423828125, norm_rel=0.025390418246388435, ref_abs_avg=39.6002082824707, test_abs_avg=39.60150909423828
production_forward2 grad[31] vs paper_forward: mean_abs=0.9323668479919434, max_abs=6.0, mean_rel=0.3473207354545593, max_rel=2749.999755859375, norm_rel=0.02384822815656662, ref_abs_avg=39.19947052001953, test_abs_avg=39.20320129394531
production_forward2 grad[32] vs paper_forward: mean_abs=0.717207670211792, max_abs=3.1875, mean_rel=0.1098640039563179, max_rel=17.369813919067383, norm_rel=0.02459469996392727, ref_abs_avg=30.383373260498047, test_abs_avg=30.419145584106445
production_forward2 grad[33] vs paper_forward: mean_abs=0.9313628673553467, max_abs=6.0, mean_rel=0.17155420780181885, max_rel=1516.0369873046875, norm_rel=0.02518024854362011, ref_abs_avg=37.12568283081055, test_abs_avg=37.12870788574219
production_forward2 grad[34] vs paper_forward: mean_abs=0.8731162548065186, max_abs=5.25, mean_rel=0.33678317070007324, max_rel=3156.249755859375, norm_rel=0.024019014090299606, ref_abs_avg=36.418251037597656, test_abs_avg=36.42169952392578
production_forward2 grad[35] vs paper_forward: mean_abs=0.6830329895019531, max_abs=2.75, mean_rel=0.246966153383255, max_rel=74.83293151855469, norm_rel=0.024378644302487373, ref_abs_avg=28.30740737915039, test_abs_avg=28.28736114501953
production_forward2 grad[36] vs paper_forward: mean_abs=0.8742415904998779, max_abs=6.0, mean_rel=0.17181377112865448, max_rel=2485.587158203125, norm_rel=0.025076311081647873, ref_abs_avg=35.04063415527344, test_abs_avg=35.041378021240234
production_forward2 grad[37] vs paper_forward: mean_abs=0.812264621257782, max_abs=4.921875, mean_rel=0.25438565015792847, max_rel=2125.0, norm_rel=0.023585913702845573, ref_abs_avg=34.50562286376953, test_abs_avg=34.51274871826172
production_forward2 grad[38] vs paper_forward: mean_abs=0.6240799427032471, max_abs=2.5, mean_rel=0.43621352314949036, max_rel=187.72401428222656, norm_rel=0.02324431948363781, ref_abs_avg=27.21369171142578, test_abs_avg=27.230688095092773
production_forward2 grad[39] vs paper_forward: mean_abs=0.8272293210029602, max_abs=6.0, mean_rel=0.1609855592250824, max_rel=1405.8453369140625, norm_rel=0.024842804297804832, ref_abs_avg=33.41119384765625, test_abs_avg=33.41373825073242
production_forward2 grad[40] vs paper_forward: mean_abs=0.7738105058670044, max_abs=4.75, mean_rel=0.3094387352466583, max_rel=2812.499755859375, norm_rel=0.023526886478066444, ref_abs_avg=33.01679229736328, test_abs_avg=33.01323318481445
production_forward2 grad[41] vs paper_forward: mean_abs=0.6153030395507812, max_abs=2.5, mean_rel=0.11670680344104767, max_rel=5.89670991897583, norm_rel=0.023624354973435402, ref_abs_avg=25.18268585205078, test_abs_avg=25.198631286621094
production_forward2 grad[42] vs paper_forward: mean_abs=0.7817599773406982, max_abs=6.5, mean_rel=0.16124242544174194, max_rel=1529.216796875, norm_rel=0.02462269552052021, ref_abs_avg=31.831653594970703, test_abs_avg=31.83371925354004
production_forward2 grad[43] vs paper_forward: mean_abs=0.7311904430389404, max_abs=4.5, mean_rel=0.2979790270328522, max_rel=3656.249755859375, norm_rel=0.023067036643624306, ref_abs_avg=31.732929229736328, test_abs_avg=31.73493194580078
production_forward2 grad[44] vs paper_forward: mean_abs=0.5895910263061523, max_abs=2.75, mean_rel=0.08538729697465897, max_rel=4.211636543273926, norm_rel=0.024023691192269325, ref_abs_avg=24.55997085571289, test_abs_avg=24.58806610107422
production_forward2 grad[45] vs paper_forward: mean_abs=0.7489688396453857, max_abs=6.0, mean_rel=0.15901967883110046, max_rel=1081.2301025390625, norm_rel=0.024424521252512932, ref_abs_avg=30.75070571899414, test_abs_avg=30.75277328491211
production_forward2 grad[46] vs paper_forward: mean_abs=0.6950820088386536, max_abs=4.5, mean_rel=0.2289971113204956, max_rel=1687.4998779296875, norm_rel=0.02315494976937771, ref_abs_avg=30.123849868774414, test_abs_avg=30.120908737182617
production_forward2 grad[47] vs paper_forward: mean_abs=0.5657107830047607, max_abs=2.25, mean_rel=0.1016082614660263, max_rel=3.732009172439575, norm_rel=0.024096651002764702, ref_abs_avg=23.33643341064453, test_abs_avg=23.343013763427734
production_forward2 grad[48] vs paper_forward: mean_abs=0.7114270925521851, max_abs=6.0, mean_rel=0.1542668640613556, max_rel=1040.0006103515625, norm_rel=0.024274714291095734, ref_abs_avg=29.413192749023438, test_abs_avg=29.414085388183594
production_forward2 grad[49] vs paper_forward: mean_abs=0.6606453657150269, max_abs=4.625, mean_rel=0.24080640077590942, max_rel=2062.5, norm_rel=0.02254238724708557, ref_abs_avg=29.36162757873535, test_abs_avg=29.369338989257812
production_forward2 grad[50] vs paper_forward: mean_abs=0.6273584365844727, max_abs=2.4375, mean_rel=0.09150325506925583, max_rel=14.498266220092773, norm_rel=0.024370545521378517, ref_abs_avg=25.717899322509766, test_abs_avg=25.665300369262695
production_forward2 grad[51] vs paper_forward: mean_abs=0.7983205914497375, max_abs=5.5, mean_rel=0.17305578291416168, max_rel=1672.256103515625, norm_rel=0.02569781243801117, ref_abs_avg=31.173641204833984, test_abs_avg=31.176849365234375
production_forward2 grad[52] vs paper_forward: mean_abs=0.7347675561904907, max_abs=5.109375, mean_rel=0.26268404722213745, max_rel=1937.4998779296875, norm_rel=0.023998813703656197, ref_abs_avg=30.651351928710938, test_abs_avg=30.648380279541016
production_forward2 grad[53] vs paper_forward: mean_abs=0.5749092102050781, max_abs=2.03125, mean_rel=0.06121927127242088, max_rel=2.0676896572113037, norm_rel=0.022682061418890953, ref_abs_avg=25.332473754882812, test_abs_avg=25.333820343017578
production_forward2 grad[54] vs paper_forward: mean_abs=0.7402133345603943, max_abs=6.5, mean_rel=0.163090318441391, max_rel=620.0584106445312, norm_rel=0.025354428216814995, ref_abs_avg=29.237293243408203, test_abs_avg=29.23755645751953
production_forward2 grad[55] vs paper_forward: mean_abs=0.6893801689147949, max_abs=4.6875, mean_rel=0.24913078546524048, max_rel=2687.499755859375, norm_rel=0.02359846420586109, ref_abs_avg=29.1878662109375, test_abs_avg=29.198833465576172
production_forward2 grad[56] vs paper_forward: mean_abs=0.5175634622573853, max_abs=2.3125, mean_rel=0.17304876446723938, max_rel=36.045021057128906, norm_rel=0.023579267784953117, ref_abs_avg=22.020174026489258, test_abs_avg=21.98907470703125
production_forward2 grad[57] vs paper_forward: mean_abs=0.6844199895858765, max_abs=5.0, mean_rel=0.17218968272209167, max_rel=976.7609252929688, norm_rel=0.024826345965266228, ref_abs_avg=27.658794403076172, test_abs_avg=27.658313751220703
production_forward2 grad[58] vs paper_forward: mean_abs=0.6347280740737915, max_abs=5.0, mean_rel=0.2932750880718231, max_rel=1953.1248779296875, norm_rel=0.02343028225004673, ref_abs_avg=27.14288330078125, test_abs_avg=27.141338348388672
production_forward2 grad[59] vs paper_forward: mean_abs=0.50636887550354, max_abs=2.0, mean_rel=0.13348686695098877, max_rel=16.322559356689453, norm_rel=0.02360982820391655, ref_abs_avg=20.925762176513672, test_abs_avg=20.93067169189453
production_forward2 grad[60] vs paper_forward: mean_abs=0.6406911611557007, max_abs=4.5, mean_rel=0.15192893147468567, max_rel=1127.0201416015625, norm_rel=0.024567466229200363, ref_abs_avg=26.172117233276367, test_abs_avg=26.170520782470703
production_forward2 grad[61] vs paper_forward: mean_abs=0.5926836729049683, max_abs=4.0625, mean_rel=0.25638771057128906, max_rel=1921.8748779296875, norm_rel=0.02303297072649002, ref_abs_avg=25.738441467285156, test_abs_avg=25.734661102294922
production_forward2 grad[62] vs paper_forward: mean_abs=0.4483177661895752, max_abs=2.125, mean_rel=0.1225639283657074, max_rel=15.659429550170898, norm_rel=0.021967120468616486, ref_abs_avg=20.494718551635742, test_abs_avg=20.490089416503906
production_forward2 grad[63] vs paper_forward: mean_abs=0.6056735515594482, max_abs=4.5, mean_rel=0.14982831478118896, max_rel=826.67431640625, norm_rel=0.024214716628193855, ref_abs_avg=25.013874053955078, test_abs_avg=25.012866973876953
production_forward2 grad[64] vs paper_forward: mean_abs=0.5590997934341431, max_abs=3.875, mean_rel=0.2470184564590454, max_rel=2125.0, norm_rel=0.022382140159606934, ref_abs_avg=24.941043853759766, test_abs_avg=24.943828582763672
production_forward2 grad[65] vs paper_forward: mean_abs=0.4581540822982788, max_abs=1.625, mean_rel=0.07557525485754013, max_rel=4.740161418914795, norm_rel=0.02313198707997799, ref_abs_avg=19.94070816040039, test_abs_avg=19.926240921020508
production_forward2 grad[66] vs paper_forward: mean_abs=0.5724322199821472, max_abs=4.5, mean_rel=0.14336548745632172, max_rel=708.910888671875, norm_rel=0.023541631177067757, ref_abs_avg=24.32443618774414, test_abs_avg=24.3249454498291
production_forward2 grad[67] vs paper_forward: mean_abs=0.5228469371795654, max_abs=4.0, mean_rel=0.23024745285511017, max_rel=1624.9998779296875, norm_rel=0.02194327674806118, ref_abs_avg=23.79714584350586, test_abs_avg=23.802188873291016
production_forward2 grad[68] vs paper_forward: mean_abs=0.4573831558227539, max_abs=1.75, mean_rel=0.10284462571144104, max_rel=14.291117668151855, norm_rel=0.02340947836637497, ref_abs_avg=19.243696212768555, test_abs_avg=19.237743377685547
production_forward2 grad[69] vs paper_forward: mean_abs=0.5500643849372864, max_abs=6.0, mean_rel=0.14387032389640808, max_rel=544.8742065429688, norm_rel=0.02316650189459324, ref_abs_avg=23.71072006225586, test_abs_avg=23.711528778076172
production_forward2 grad[70] vs paper_forward: mean_abs=0.49281686544418335, max_abs=3.5, mean_rel=0.21503981947898865, max_rel=1031.25, norm_rel=0.02124876156449318, ref_abs_avg=23.15892791748047, test_abs_avg=23.160770416259766
production_forward2 grad[71] vs paper_forward: mean_abs=0.4095425605773926, max_abs=1.5, mean_rel=0.10431265085935593, max_rel=16.57823371887207, norm_rel=0.021882805973291397, ref_abs_avg=18.898651123046875, test_abs_avg=18.891422271728516
production_forward2 grad[72] vs paper_forward: mean_abs=0.5167454481124878, max_abs=4.875, mean_rel=0.1385786086320877, max_rel=504.10321044921875, norm_rel=0.02286747470498085, ref_abs_avg=22.619670867919922, test_abs_avg=22.619701385498047
production_forward2 grad[73] vs paper_forward: mean_abs=0.4689045548439026, max_abs=4.5625, mean_rel=0.21713118255138397, max_rel=1593.7498779296875, norm_rel=0.0209704227745533, ref_abs_avg=22.30689811706543, test_abs_avg=22.307331085205078
production_forward2 grad[74] vs paper_forward: mean_abs=0.43294715881347656, max_abs=1.625, mean_rel=0.09813482314348221, max_rel=8.571538925170898, norm_rel=0.02454446442425251, ref_abs_avg=18.335247039794922, test_abs_avg=18.330312728881836
production_forward2 grad[75] vs paper_forward: mean_abs=0.5520355701446533, max_abs=4.25, mean_rel=0.1593080759048462, max_rel=1411.5301513671875, norm_rel=0.024849355220794678, ref_abs_avg=22.265291213989258, test_abs_avg=22.265357971191406
production_forward2 grad[76] vs paper_forward: mean_abs=0.5072794556617737, max_abs=3.9375, mean_rel=0.21663737297058105, max_rel=1718.7498779296875, norm_rel=0.0229216106235981, ref_abs_avg=22.141754150390625, test_abs_avg=22.14522361755371
production_forward2 grad[77] vs paper_forward: mean_abs=0.3964576721191406, max_abs=1.46484375, mean_rel=0.09210419654846191, max_rel=6.611316204071045, norm_rel=0.02263091877102852, ref_abs_avg=17.57805824279785, test_abs_avg=17.58846092224121
production_forward2 grad[78] vs paper_forward: mean_abs=0.5138413906097412, max_abs=4.5, mean_rel=0.16260787844657898, max_rel=1213.9764404296875, norm_rel=0.024072112515568733, ref_abs_avg=21.350357055664062, test_abs_avg=21.350292205810547
production_forward2 grad[79] vs paper_forward: mean_abs=0.4672657251358032, max_abs=3.25, mean_rel=0.23455612361431122, max_rel=1499.9998779296875, norm_rel=0.022538302466273308, ref_abs_avg=20.67032241821289, test_abs_avg=20.68142318725586
production_forward2 grad[80] vs paper_forward: mean_abs=0.37729722261428833, max_abs=1.65625, mean_rel=0.30329033732414246, max_rel=108.20767211914062, norm_rel=0.02272622473537922, ref_abs_avg=16.57541275024414, test_abs_avg=16.5638427734375
production_forward2 grad[81] vs paper_forward: mean_abs=0.47277066111564636, max_abs=4.25, mean_rel=0.14824265241622925, max_rel=822.6216430664062, norm_rel=0.023635024204850197, ref_abs_avg=20.048946380615234, test_abs_avg=20.049461364746094
production_forward2 grad[82] vs paper_forward: mean_abs=0.4363417327404022, max_abs=3.375, mean_rel=0.21180835366249084, max_rel=1624.9998779296875, norm_rel=0.022173341363668442, ref_abs_avg=19.752817153930664, test_abs_avg=19.75592803955078
production_forward2 grad[83] vs paper_forward: mean_abs=0.3536638021469116, max_abs=1.25, mean_rel=0.0826050266623497, max_rel=6.287319183349609, norm_rel=0.022137325257062912, ref_abs_avg=15.970255851745605, test_abs_avg=15.961563110351562
production_forward2 grad[84] vs paper_forward: mean_abs=0.4475729465484619, max_abs=5.0, mean_rel=0.14148889482021332, max_rel=755.408935546875, norm_rel=0.02318257838487625, ref_abs_avg=19.36844825744629, test_abs_avg=19.369081497192383
production_forward2 grad[85] vs paper_forward: mean_abs=0.40526023507118225, max_abs=3.5, mean_rel=0.1904873549938202, max_rel=968.7499389648438, norm_rel=0.021372536197304726, ref_abs_avg=19.009563446044922, test_abs_avg=19.009443283081055
production_forward2 grad[86] vs paper_forward: mean_abs=0.3076671361923218, max_abs=1.125, mean_rel=0.11395865678787231, max_rel=15.815652847290039, norm_rel=0.01993691734969616, ref_abs_avg=15.638330459594727, test_abs_avg=15.629992485046387
production_forward2 grad[87] vs paper_forward: mean_abs=0.4131920635700226, max_abs=5.0, mean_rel=0.14286580681800842, max_rel=1812.9461669921875, norm_rel=0.022522971034049988, ref_abs_avg=18.4451961517334, test_abs_avg=18.446149826049805
production_forward2 grad[88] vs paper_forward: mean_abs=0.3809279799461365, max_abs=2.75, mean_rel=0.20374979078769684, max_rel=1562.4998779296875, norm_rel=0.020693548023700714, ref_abs_avg=18.477798461914062, test_abs_avg=18.475170135498047
production_forward2 grad[89] vs paper_forward: mean_abs=0.3065621852874756, max_abs=1.375, mean_rel=0.08778079599142075, max_rel=10.30885124206543, norm_rel=0.020158259198069572, ref_abs_avg=15.530938148498535, test_abs_avg=15.541886329650879
production_forward2 grad[90] vs paper_forward: mean_abs=0.39615994691848755, max_abs=5.0, mean_rel=0.13580557703971863, max_rel=892.9929809570312, norm_rel=0.02213926427066326, ref_abs_avg=18.061786651611328, test_abs_avg=18.06292724609375
production_forward2 grad[91] vs paper_forward: mean_abs=0.36202576756477356, max_abs=3.5, mean_rel=0.16590747237205505, max_rel=1531.2498779296875, norm_rel=0.020302969962358475, ref_abs_avg=17.920867919921875, test_abs_avg=17.922698974609375
production_forward2 grad[92] vs paper_forward: mean_abs=0.28250694274902344, max_abs=1.125, mean_rel=0.10377135872840881, max_rel=18.424165725708008, norm_rel=0.020879477262496948, ref_abs_avg=13.803266525268555, test_abs_avg=13.806119918823242
production_forward2 grad[93] vs paper_forward: mean_abs=0.3718219995498657, max_abs=4.0, mean_rel=0.1323966085910797, max_rel=592.7666625976562, norm_rel=0.02193814143538475, ref_abs_avg=17.142784118652344, test_abs_avg=17.143009185791016
production_forward2 grad[94] vs paper_forward: mean_abs=0.3445259928703308, max_abs=3.34375, mean_rel=0.1970134675502777, max_rel=1843.7498779296875, norm_rel=0.02055184915661812, ref_abs_avg=17.063758850097656, test_abs_avg=17.060325622558594
production_forward2 grad[95] vs paper_forward: mean_abs=0.29377782344818115, max_abs=1.3125, mean_rel=0.10904556512832642, max_rel=23.159521102905273, norm_rel=0.02062632143497467, ref_abs_avg=14.096046447753906, test_abs_avg=14.106295585632324
production_forward2 grad[96] vs paper_forward: mean_abs=0.3538498878479004, max_abs=4.453125, mean_rel=0.11901307851076126, max_rel=367.0726623535156, norm_rel=0.021177014335989952, ref_abs_avg=16.993671417236328, test_abs_avg=16.992385864257812
production_forward2 grad[97] vs paper_forward: mean_abs=0.32229334115982056, max_abs=3.0625, mean_rel=0.1782553493976593, max_rel=1515.6248779296875, norm_rel=0.01936432160437107, ref_abs_avg=16.94099998474121, test_abs_avg=16.937393188476562
identity layers + randn queries
production_forward2 fwd+bwd:  113.566 ms
production_forward2 bwd-only: 95.957 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.320 GiB, fwd+bwd=10.320 GiB
production_forward fwd+bwd:  116.472 ms
production_forward bwd-only: 95.971 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.320 GiB, fwd+bwd=10.320 GiB
paper_forward fwd+bwd:  382.329 ms
paper_forward bwd-only: 302.281 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.740 GiB, fwd+bwd=32.490 GiB
torch_compile_phases_forward fwd+bwd:  166.632 ms
torch_compile_phases_forward bwd-only: 132.893 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016212068730965257, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008344028145074844, max_abs=0.4375, mean_rel=0.07268735766410828, max_rel=98.69426727294922, norm_rel=0.020024564117193222, ref_abs_avg=0.4541972875595093, test_abs_avg=0.45420145988464355
production_forward grad[1] vs paper_forward: mean_abs=7.329207897186279, max_abs=54.0, mean_rel=0.1487952023744583, max_rel=212.13546752929688, norm_rel=0.020389797165989876, ref_abs_avg=315.09918212890625, test_abs_avg=315.1235656738281
production_forward grad[2] vs paper_forward: mean_abs=1.283975601196289, max_abs=5.0, mean_rel=0.10234296321868896, max_rel=8.436094284057617, norm_rel=0.023922128602862358, ref_abs_avg=53.519554138183594, test_abs_avg=53.55940628051758
production_forward grad[3] vs paper_forward: mean_abs=1.5917468070983887, max_abs=11.0, mean_rel=0.17731627821922302, max_rel=3551.38671875, norm_rel=0.024754267185926437, ref_abs_avg=64.72822570800781, test_abs_avg=64.72511291503906
production_forward grad[4] vs paper_forward: mean_abs=1.4666087627410889, max_abs=9.0, mean_rel=0.38466107845306396, max_rel=3874.999755859375, norm_rel=0.022946391254663467, ref_abs_avg=64.17269897460938, test_abs_avg=64.16661071777344
production_forward grad[5] vs paper_forward: mean_abs=1.1690492630004883, max_abs=5.5, mean_rel=0.14244158565998077, max_rel=22.54753875732422, norm_rel=0.02608860842883587, ref_abs_avg=44.479793548583984, test_abs_avg=44.50702667236328
production_forward grad[6] vs paper_forward: mean_abs=1.3975131511688232, max_abs=9.0, mean_rel=0.16382145881652832, max_rel=1981.9376220703125, norm_rel=0.024466564878821373, ref_abs_avg=57.48408508300781, test_abs_avg=57.48663330078125
production_forward grad[7] vs paper_forward: mean_abs=1.2864971160888672, max_abs=7.375, mean_rel=0.31657958030700684, max_rel=4187.5, norm_rel=0.02281818352639675, ref_abs_avg=56.65764617919922, test_abs_avg=56.6588134765625
production_forward grad[8] vs paper_forward: mean_abs=1.003499984741211, max_abs=4.5, mean_rel=0.07611415535211563, max_rel=3.493788957595825, norm_rel=0.02377384901046753, ref_abs_avg=41.556983947753906, test_abs_avg=41.61647033691406
production_forward grad[9] vs paper_forward: mean_abs=1.2653226852416992, max_abs=8.0, mean_rel=0.15823781490325928, max_rel=1272.6239013671875, norm_rel=0.024324363097548485, ref_abs_avg=52.361183166503906, test_abs_avg=52.362945556640625
production_forward grad[10] vs paper_forward: mean_abs=1.1612536907196045, max_abs=7.28125, mean_rel=0.29891204833984375, max_rel=3124.999755859375, norm_rel=0.022574149072170258, ref_abs_avg=51.70541000366211, test_abs_avg=51.69804382324219
production_forward grad[11] vs paper_forward: mean_abs=0.9274752140045166, max_abs=3.625, mean_rel=0.3208664655685425, max_rel=92.37419128417969, norm_rel=0.024142101407051086, ref_abs_avg=39.22953796386719, test_abs_avg=39.2235107421875
production_forward grad[12] vs paper_forward: mean_abs=1.1721596717834473, max_abs=8.0, mean_rel=0.15834304690361023, max_rel=1034.838623046875, norm_rel=0.024240171536803246, ref_abs_avg=48.67662048339844, test_abs_avg=48.67596435546875
production_forward grad[13] vs paper_forward: mean_abs=1.0828213691711426, max_abs=6.75, mean_rel=0.2590213418006897, max_rel=3062.499755859375, norm_rel=0.02250564470887184, ref_abs_avg=48.36091995239258, test_abs_avg=48.36450958251953
production_forward grad[14] vs paper_forward: mean_abs=0.8487882614135742, max_abs=4.0, mean_rel=0.09867718815803528, max_rel=4.948365211486816, norm_rel=0.02183820866048336, ref_abs_avg=38.114097595214844, test_abs_avg=38.038238525390625
production_forward grad[15] vs paper_forward: mean_abs=1.096403956413269, max_abs=8.0, mean_rel=0.1589011549949646, max_rel=2371.199462890625, norm_rel=0.024084191769361496, ref_abs_avg=45.83657455444336, test_abs_avg=45.83586120605469
production_forward grad[16] vs paper_forward: mean_abs=1.0107853412628174, max_abs=6.25, mean_rel=0.30688029527664185, max_rel=3312.499755859375, norm_rel=0.022458083927631378, ref_abs_avg=45.26287841796875, test_abs_avg=45.254486083984375
production_forward grad[17] vs paper_forward: mean_abs=0.8099145889282227, max_abs=3.5, mean_rel=0.0954168289899826, max_rel=11.202981948852539, norm_rel=0.02254973165690899, ref_abs_avg=35.86454391479492, test_abs_avg=35.813926696777344
production_forward grad[18] vs paper_forward: mean_abs=1.0316177606582642, max_abs=6.5, mean_rel=0.1594933569431305, max_rel=2233.74560546875, norm_rel=0.023851510137319565, ref_abs_avg=43.50834274291992, test_abs_avg=43.50830078125
production_forward grad[19] vs paper_forward: mean_abs=0.9529284238815308, max_abs=6.0, mean_rel=0.2591550350189209, max_rel=2874.999755859375, norm_rel=0.02226700447499752, ref_abs_avg=43.004150390625, test_abs_avg=43.006683349609375
production_forward grad[20] vs paper_forward: mean_abs=0.7961905002593994, max_abs=3.0, mean_rel=0.14529083669185638, max_rel=27.723936080932617, norm_rel=0.023296896368265152, ref_abs_avg=32.862335205078125, test_abs_avg=32.86444854736328
production_forward grad[21] vs paper_forward: mean_abs=0.9750512838363647, max_abs=7.0, mean_rel=0.16184721887111664, max_rel=1223.3934326171875, norm_rel=0.023769458755850792, ref_abs_avg=41.29765319824219, test_abs_avg=41.296546936035156
production_forward grad[22] vs paper_forward: mean_abs=0.8994529247283936, max_abs=5.75, mean_rel=0.33935630321502686, max_rel=2937.499755859375, norm_rel=0.022267252206802368, ref_abs_avg=40.547096252441406, test_abs_avg=40.55076217651367
production_forward grad[23] vs paper_forward: mean_abs=0.7014451026916504, max_abs=2.875, mean_rel=0.10183396190404892, max_rel=8.774178504943848, norm_rel=0.020978856831789017, ref_abs_avg=33.0484733581543, test_abs_avg=33.030799865722656
production_forward grad[24] vs paper_forward: mean_abs=0.9339953660964966, max_abs=7.0, mean_rel=0.14536529779434204, max_rel=928.9420776367188, norm_rel=0.023611342534422874, ref_abs_avg=39.84115219116211, test_abs_avg=39.84117889404297
production_forward grad[25] vs paper_forward: mean_abs=0.8556268215179443, max_abs=5.0, mean_rel=0.33055374026298523, max_rel=3062.499755859375, norm_rel=0.0219579990953207, ref_abs_avg=39.149497985839844, test_abs_avg=39.148292541503906
production_forward grad[26] vs paper_forward: mean_abs=0.7898006439208984, max_abs=3.75, mean_rel=0.07476109266281128, max_rel=4.633105278015137, norm_rel=0.023860814049839973, ref_abs_avg=33.42749786376953, test_abs_avg=33.38949203491211
production_forward grad[27] vs paper_forward: mean_abs=1.0766994953155518, max_abs=7.0, mean_rel=0.1781996786594391, max_rel=1799.162109375, norm_rel=0.025657901540398598, ref_abs_avg=42.20874786376953, test_abs_avg=42.21043395996094
production_forward grad[28] vs paper_forward: mean_abs=1.0021271705627441, max_abs=6.5, mean_rel=0.2868417799472809, max_rel=2406.25, norm_rel=0.02401801571249962, ref_abs_avg=41.9300537109375, test_abs_avg=41.94843292236328
production_forward grad[29] vs paper_forward: mean_abs=0.7836956977844238, max_abs=3.25, mean_rel=0.10048027336597443, max_rel=17.214183807373047, norm_rel=0.024264948442578316, ref_abs_avg=32.34141540527344, test_abs_avg=32.355384826660156
production_forward grad[30] vs paper_forward: mean_abs=1.0031943321228027, max_abs=7.6953125, mean_rel=0.1734468638896942, max_rel=1309.8624267578125, norm_rel=0.02594907581806183, ref_abs_avg=38.836814880371094, test_abs_avg=38.83595657348633
production_forward grad[31] vs paper_forward: mean_abs=0.9341509342193604, max_abs=6.0, mean_rel=0.2903285026550293, max_rel=3609.374755859375, norm_rel=0.024503078311681747, ref_abs_avg=38.275840759277344, test_abs_avg=38.28383255004883
production_forward grad[32] vs paper_forward: mean_abs=0.7152862548828125, max_abs=3.5625, mean_rel=0.08715416491031647, max_rel=3.6978282928466797, norm_rel=0.024636603891849518, ref_abs_avg=29.622373580932617, test_abs_avg=29.61930274963379
production_forward grad[33] vs paper_forward: mean_abs=0.9253224730491638, max_abs=6.5, mean_rel=0.17275016009807587, max_rel=1377.0108642578125, norm_rel=0.025804489850997925, ref_abs_avg=36.02243423461914, test_abs_avg=36.02223587036133
production_forward grad[34] vs paper_forward: mean_abs=0.8677273392677307, max_abs=5.5, mean_rel=0.2750721871852875, max_rel=3124.999755859375, norm_rel=0.024537744000554085, ref_abs_avg=35.48280334472656, test_abs_avg=35.47732925415039
production_forward grad[35] vs paper_forward: mean_abs=0.6611151695251465, max_abs=2.75, mean_rel=0.13399966061115265, max_rel=24.639305114746094, norm_rel=0.024600563570857048, ref_abs_avg=27.127288818359375, test_abs_avg=27.00399398803711
production_forward grad[36] vs paper_forward: mean_abs=0.8726345300674438, max_abs=7.0, mean_rel=0.16604726016521454, max_rel=1380.81982421875, norm_rel=0.02573946863412857, ref_abs_avg=33.98979568481445, test_abs_avg=33.98751449584961
production_forward grad[37] vs paper_forward: mean_abs=0.8155354857444763, max_abs=4.75, mean_rel=0.3240383267402649, max_rel=2937.499755859375, norm_rel=0.02417689561843872, ref_abs_avg=33.86103820800781, test_abs_avg=33.86182403564453
production_forward grad[38] vs paper_forward: mean_abs=0.6208219528198242, max_abs=2.125, mean_rel=0.06380072981119156, max_rel=2.360760450363159, norm_rel=0.02271498553454876, ref_abs_avg=27.46291732788086, test_abs_avg=27.451446533203125
production_forward grad[39] vs paper_forward: mean_abs=0.8223017454147339, max_abs=5.5, mean_rel=0.17056985199451447, max_rel=1450.9259033203125, norm_rel=0.02540578320622444, ref_abs_avg=32.48259353637695, test_abs_avg=32.4790153503418
production_forward grad[40] vs paper_forward: mean_abs=0.7687215209007263, max_abs=5.0, mean_rel=0.30768275260925293, max_rel=2437.5, norm_rel=0.02409197948873043, ref_abs_avg=32.00001525878906, test_abs_avg=32.00061798095703
production_forward grad[41] vs paper_forward: mean_abs=0.590721607208252, max_abs=2.25, mean_rel=0.09310197830200195, max_rel=11.022985458374023, norm_rel=0.021916259080171585, ref_abs_avg=27.41006851196289, test_abs_avg=27.3724365234375
production_forward grad[42] vs paper_forward: mean_abs=0.7849168181419373, max_abs=5.5, mean_rel=0.1652100533246994, max_rel=1265.0498046875, norm_rel=0.025279782712459564, ref_abs_avg=31.19741439819336, test_abs_avg=31.19563865661621
production_forward grad[43] vs paper_forward: mean_abs=0.7274484038352966, max_abs=4.5, mean_rel=0.27322906255722046, max_rel=2031.2498779296875, norm_rel=0.02379138395190239, ref_abs_avg=30.64480209350586, test_abs_avg=30.63745880126953
production_forward grad[44] vs paper_forward: mean_abs=0.5976477861404419, max_abs=2.421875, mean_rel=0.3248198330402374, max_rel=102.18528747558594, norm_rel=0.02478887140750885, ref_abs_avg=24.107025146484375, test_abs_avg=24.154850006103516
production_forward grad[45] vs paper_forward: mean_abs=0.7413941025733948, max_abs=5.5, mean_rel=0.1536407321691513, max_rel=885.0523681640625, norm_rel=0.02510041743516922, ref_abs_avg=29.65782928466797, test_abs_avg=29.655357360839844
production_forward grad[46] vs paper_forward: mean_abs=0.6943999528884888, max_abs=4.25, mean_rel=0.34551379084587097, max_rel=2203.125, norm_rel=0.02354259602725506, ref_abs_avg=29.573226928710938, test_abs_avg=29.571197509765625
production_forward grad[47] vs paper_forward: mean_abs=0.5664117336273193, max_abs=2.5, mean_rel=0.08495815098285675, max_rel=3.8985793590545654, norm_rel=0.023611944168806076, ref_abs_avg=24.205772399902344, test_abs_avg=24.227020263671875
production_forward grad[48] vs paper_forward: mean_abs=0.7127172946929932, max_abs=5.0, mean_rel=0.16166076064109802, max_rel=1036.3756103515625, norm_rel=0.024851126596331596, ref_abs_avg=28.772140502929688, test_abs_avg=28.771610260009766
production_forward grad[49] vs paper_forward: mean_abs=0.6650872230529785, max_abs=4.125, mean_rel=0.26161283254623413, max_rel=2312.5, norm_rel=0.023279838263988495, ref_abs_avg=28.587627410888672, test_abs_avg=28.58979034423828
production_forward grad[50] vs paper_forward: mean_abs=0.6094455718994141, max_abs=2.5, mean_rel=0.1482551395893097, max_rel=23.998554229736328, norm_rel=0.024226924404501915, ref_abs_avg=25.782583236694336, test_abs_avg=25.78582191467285
production_forward grad[51] vs paper_forward: mean_abs=0.802557110786438, max_abs=5.5, mean_rel=0.16548851132392883, max_rel=1068.1683349609375, norm_rel=0.02645450085401535, ref_abs_avg=30.43933868408203, test_abs_avg=30.441543579101562
production_forward grad[52] vs paper_forward: mean_abs=0.74439936876297, max_abs=5.1875, mean_rel=0.24199777841567993, max_rel=2140.625, norm_rel=0.024686431512236595, ref_abs_avg=30.176952362060547, test_abs_avg=30.178354263305664
production_forward grad[53] vs paper_forward: mean_abs=0.5533790588378906, max_abs=2.375, mean_rel=0.09177368879318237, max_rel=8.89439582824707, norm_rel=0.023980529978871346, ref_abs_avg=23.296977996826172, test_abs_avg=23.295482635498047
production_forward grad[54] vs paper_forward: mean_abs=0.7320895195007324, max_abs=5.0, mean_rel=0.16471588611602783, max_rel=2172.346435546875, norm_rel=0.02588881179690361, ref_abs_avg=28.34943199157715, test_abs_avg=28.349044799804688
production_forward grad[55] vs paper_forward: mean_abs=0.6875449419021606, max_abs=5.0, mean_rel=0.2799258232116699, max_rel=2406.25, norm_rel=0.024235257878899574, ref_abs_avg=28.36273956298828, test_abs_avg=28.36337661743164
production_forward grad[56] vs paper_forward: mean_abs=0.5368661880493164, max_abs=3.0, mean_rel=0.08284725993871689, max_rel=2.693014144897461, norm_rel=0.024052176624536514, ref_abs_avg=22.581302642822266, test_abs_avg=22.585811614990234
production_forward grad[57] vs paper_forward: mean_abs=0.6863005757331848, max_abs=5.5, mean_rel=0.1650591343641281, max_rel=936.3267822265625, norm_rel=0.025476766750216484, ref_abs_avg=26.99629020690918, test_abs_avg=26.9942626953125
production_forward grad[58] vs paper_forward: mean_abs=0.6403063535690308, max_abs=4.75, mean_rel=0.241871640086174, max_rel=2062.5, norm_rel=0.024057630449533463, ref_abs_avg=26.683412551879883, test_abs_avg=26.68448257446289
production_forward grad[59] vs paper_forward: mean_abs=0.49406909942626953, max_abs=2.5, mean_rel=0.09777121990919113, max_rel=11.692480087280273, norm_rel=0.024616457521915436, ref_abs_avg=21.184329986572266, test_abs_avg=21.197914123535156
production_forward grad[60] vs paper_forward: mean_abs=0.646699070930481, max_abs=5.0, mean_rel=0.16606569290161133, max_rel=1493.714111328125, norm_rel=0.02506229653954506, ref_abs_avg=25.83932876586914, test_abs_avg=25.838829040527344
production_forward grad[61] vs paper_forward: mean_abs=0.5999762415885925, max_abs=3.75, mean_rel=0.2143220603466034, max_rel=1812.4998779296875, norm_rel=0.0236095879226923, ref_abs_avg=25.41421890258789, test_abs_avg=25.409448623657227
production_forward grad[62] vs paper_forward: mean_abs=0.49379491806030273, max_abs=1.75, mean_rel=0.11046101152896881, max_rel=14.225994110107422, norm_rel=0.024280758574604988, ref_abs_avg=19.715866088867188, test_abs_avg=19.73239517211914
production_forward grad[63] vs paper_forward: mean_abs=0.6119325160980225, max_abs=5.0, mean_rel=0.15585441887378693, max_rel=551.1781005859375, norm_rel=0.02488272450864315, ref_abs_avg=24.613109588623047, test_abs_avg=24.61219024658203
production_forward grad[64] vs paper_forward: mean_abs=0.5706827640533447, max_abs=4.0, mean_rel=0.24530190229415894, max_rel=1999.9998779296875, norm_rel=0.023323632776737213, ref_abs_avg=24.48543930053711, test_abs_avg=24.483020782470703
production_forward grad[65] vs paper_forward: mean_abs=0.443511962890625, max_abs=1.59375, mean_rel=0.0815894603729248, max_rel=6.150856018066406, norm_rel=0.022331498563289642, ref_abs_avg=19.775279998779297, test_abs_avg=19.764389038085938
production_forward grad[66] vs paper_forward: mean_abs=0.5808446407318115, max_abs=5.0, mean_rel=0.15304473042488098, max_rel=1304.2442626953125, norm_rel=0.02461233362555504, ref_abs_avg=23.66252899169922, test_abs_avg=23.661239624023438
production_forward grad[67] vs paper_forward: mean_abs=0.5404738187789917, max_abs=4.5, mean_rel=0.2334795594215393, max_rel=2093.75, norm_rel=0.0229828879237175, ref_abs_avg=23.53946304321289, test_abs_avg=23.53984832763672
production_forward grad[68] vs paper_forward: mean_abs=0.47090697288513184, max_abs=2.0, mean_rel=0.14919906854629517, max_rel=24.001502990722656, norm_rel=0.025001712143421173, ref_abs_avg=18.64753532409668, test_abs_avg=18.658716201782227
production_forward grad[69] vs paper_forward: mean_abs=0.5542678833007812, max_abs=4.5, mean_rel=0.1477939784526825, max_rel=1376.2266845703125, norm_rel=0.024088401347398758, ref_abs_avg=23.048900604248047, test_abs_avg=23.04827880859375
production_forward grad[70] vs paper_forward: mean_abs=0.5122537612915039, max_abs=3.75, mean_rel=0.21480190753936768, max_rel=1218.75, norm_rel=0.022668451070785522, ref_abs_avg=22.588912963867188, test_abs_avg=22.586904525756836
production_forward grad[71] vs paper_forward: mean_abs=0.4232645034790039, max_abs=1.75, mean_rel=0.08319108188152313, max_rel=4.810271739959717, norm_rel=0.023714086040854454, ref_abs_avg=18.074438095092773, test_abs_avg=18.057117462158203
production_forward grad[72] vs paper_forward: mean_abs=0.529541552066803, max_abs=4.75, mean_rel=0.1476057469844818, max_rel=731.1327514648438, norm_rel=0.023830508813261986, ref_abs_avg=22.27348518371582, test_abs_avg=22.273317337036133
production_forward grad[73] vs paper_forward: mean_abs=0.4866744875907898, max_abs=3.75, mean_rel=0.20158979296684265, max_rel=1125.0, norm_rel=0.021880611777305603, ref_abs_avg=22.204076766967773, test_abs_avg=22.209495544433594
production_forward grad[74] vs paper_forward: mean_abs=0.4806530475616455, max_abs=2.0, mean_rel=0.25141221284866333, max_rel=56.892478942871094, norm_rel=0.023067310452461243, ref_abs_avg=20.882671356201172, test_abs_avg=20.883350372314453
production_forward grad[75] vs paper_forward: mean_abs=0.619985818862915, max_abs=5.0, mean_rel=0.16643047332763672, max_rel=1251.338134765625, norm_rel=0.025148380547761917, ref_abs_avg=24.687454223632812, test_abs_avg=24.68883514404297
production_forward grad[76] vs paper_forward: mean_abs=0.5675946474075317, max_abs=4.0, mean_rel=0.1925417184829712, max_rel=1374.9998779296875, norm_rel=0.023450685665011406, ref_abs_avg=24.217041015625, test_abs_avg=24.216707229614258
production_forward grad[77] vs paper_forward: mean_abs=0.43830013275146484, max_abs=1.75, mean_rel=0.08465898036956787, max_rel=6.04940128326416, norm_rel=0.02356320433318615, ref_abs_avg=18.544273376464844, test_abs_avg=18.547231674194336
production_forward grad[78] vs paper_forward: mean_abs=0.5574196577072144, max_abs=5.0, mean_rel=0.15954901278018951, max_rel=844.4090576171875, norm_rel=0.02464958280324936, ref_abs_avg=22.667020797729492, test_abs_avg=22.667652130126953
production_forward grad[79] vs paper_forward: mean_abs=0.5176754593849182, max_abs=4.4375, mean_rel=0.21794091165065765, max_rel=1250.0, norm_rel=0.023167593404650688, ref_abs_avg=22.364585876464844, test_abs_avg=22.362014770507812
production_forward grad[80] vs paper_forward: mean_abs=0.40109556913375854, max_abs=1.453125, mean_rel=0.07522393763065338, max_rel=3.330676555633545, norm_rel=0.023268437013030052, ref_abs_avg=17.12445640563965, test_abs_avg=17.092376708984375
production_forward grad[81] vs paper_forward: mean_abs=0.5084751844406128, max_abs=4.5, mean_rel=0.1474720537662506, max_rel=656.0960083007812, norm_rel=0.023829558864235878, ref_abs_avg=21.407182693481445, test_abs_avg=21.407411575317383
production_forward grad[82] vs paper_forward: mean_abs=0.4732853174209595, max_abs=3.625, mean_rel=0.20171032845973969, max_rel=1874.9998779296875, norm_rel=0.022067831829190254, ref_abs_avg=21.556543350219727, test_abs_avg=21.549219131469727
production_forward grad[83] vs paper_forward: mean_abs=0.40023088455200195, max_abs=1.53125, mean_rel=0.12053655833005905, max_rel=7.191678047180176, norm_rel=0.022383876144886017, ref_abs_avg=17.493812561035156, test_abs_avg=17.51602554321289
production_forward grad[84] vs paper_forward: mean_abs=0.4832112789154053, max_abs=5.0, mean_rel=0.14869166910648346, max_rel=1519.85400390625, norm_rel=0.02298944815993309, ref_abs_avg=21.098339080810547, test_abs_avg=21.097942352294922
production_forward grad[85] vs paper_forward: mean_abs=0.43710213899612427, max_abs=4.0, mean_rel=0.19338952004909515, max_rel=1125.0, norm_rel=0.021296031773090363, ref_abs_avg=20.573192596435547, test_abs_avg=20.567218780517578
production_forward grad[86] vs paper_forward: mean_abs=0.3286466598510742, max_abs=1.25, mean_rel=0.15065959095954895, max_rel=37.79346466064453, norm_rel=0.01929325796663761, ref_abs_avg=16.941539764404297, test_abs_avg=16.938655853271484
production_forward grad[87] vs paper_forward: mean_abs=0.4400268793106079, max_abs=4.5, mean_rel=0.14150124788284302, max_rel=1495.1728515625, norm_rel=0.022274568676948547, ref_abs_avg=19.849323272705078, test_abs_avg=19.848358154296875
production_forward grad[88] vs paper_forward: mean_abs=0.4090317189693451, max_abs=4.625, mean_rel=0.21414411067962646, max_rel=1718.7498779296875, norm_rel=0.020894674584269524, ref_abs_avg=19.691579818725586, test_abs_avg=19.690025329589844
production_forward grad[89] vs paper_forward: mean_abs=0.3073234558105469, max_abs=1.34375, mean_rel=0.05645526945590973, max_rel=2.0686264038085938, norm_rel=0.018259115517139435, ref_abs_avg=17.040508270263672, test_abs_avg=17.02959442138672
production_forward grad[90] vs paper_forward: mean_abs=0.42027202248573303, max_abs=4.1875, mean_rel=0.13516458868980408, max_rel=974.0303955078125, norm_rel=0.02191118523478508, ref_abs_avg=19.377521514892578, test_abs_avg=19.37621307373047
production_forward grad[91] vs paper_forward: mean_abs=0.380395770072937, max_abs=4.375, mean_rel=0.16581544280052185, max_rel=1414.0623779296875, norm_rel=0.020256394520401955, ref_abs_avg=19.019676208496094, test_abs_avg=19.01527976989746
production_forward grad[92] vs paper_forward: mean_abs=0.30869436264038086, max_abs=1.375, mean_rel=0.08556297421455383, max_rel=4.074711322784424, norm_rel=0.019550500437617302, ref_abs_avg=15.921136856079102, test_abs_avg=15.923690795898438
production_forward grad[93] vs paper_forward: mean_abs=0.3951822519302368, max_abs=6.0, mean_rel=0.124341681599617, max_rel=866.0677490234375, norm_rel=0.021324397996068, ref_abs_avg=18.797901153564453, test_abs_avg=18.797626495361328
production_forward grad[94] vs paper_forward: mean_abs=0.3503378629684448, max_abs=4.0, mean_rel=0.16133973002433777, max_rel=1296.8748779296875, norm_rel=0.018712414428591728, ref_abs_avg=18.880508422851562, test_abs_avg=18.885881423950195
production_forward grad[95] vs paper_forward: mean_abs=0.29520535469055176, max_abs=1.46875, mean_rel=0.07865563035011292, max_rel=7.072845935821533, norm_rel=0.019858378916978836, ref_abs_avg=14.863719940185547, test_abs_avg=14.859872817993164
production_forward grad[96] vs paper_forward: mean_abs=0.36113226413726807, max_abs=4.0, mean_rel=0.12277369946241379, max_rel=780.3638305664062, norm_rel=0.020556148141622543, ref_abs_avg=17.853126525878906, test_abs_avg=17.85384178161621
production_forward grad[97] vs paper_forward: mean_abs=0.3320552110671997, max_abs=4.5, mean_rel=0.16923286020755768, max_rel=1515.6248779296875, norm_rel=0.019071657210588455, ref_abs_avg=17.833831787109375, test_abs_avg=17.842727661132812
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016263399738818407, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00869020726531744, max_abs=0.46875, mean_rel=0.0753793865442276, max_rel=97.6411361694336, norm_rel=0.020742211490869522, ref_abs_avg=0.4541972875595093, test_abs_avg=0.45418643951416016
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.409954071044922, max_abs=50.0, mean_rel=0.14107972383499146, max_rel=96.60841369628906, norm_rel=0.020634569227695465, ref_abs_avg=315.09918212890625, test_abs_avg=315.1029357910156
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3513965606689453, max_abs=4.875, mean_rel=0.12607915699481964, max_rel=15.046103477478027, norm_rel=0.02539948932826519, ref_abs_avg=53.519554138183594, test_abs_avg=53.4757194519043
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6419169902801514, max_abs=14.0, mean_rel=0.1782163679599762, max_rel=3224.30859375, norm_rel=0.025527333840727806, ref_abs_avg=64.72822570800781, test_abs_avg=64.72129821777344
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.516075611114502, max_abs=9.0, mean_rel=0.36768293380737305, max_rel=5062.5, norm_rel=0.023714708164334297, ref_abs_avg=64.17269897460938, test_abs_avg=64.16816711425781
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.2084059715270996, max_abs=4.5, mean_rel=0.12200264632701874, max_rel=15.616668701171875, norm_rel=0.027161072939634323, ref_abs_avg=44.479793548583984, test_abs_avg=44.508968353271484
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4434080123901367, max_abs=9.75, mean_rel=0.16949151456356049, max_rel=2586.693115234375, norm_rel=0.025249313563108444, ref_abs_avg=57.48408508300781, test_abs_avg=57.48385238647461
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3336784839630127, max_abs=8.0, mean_rel=0.3130965828895569, max_rel=3749.999755859375, norm_rel=0.023633869364857674, ref_abs_avg=56.65764617919922, test_abs_avg=56.66075897216797
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0234880447387695, max_abs=3.625, mean_rel=0.07394763827323914, max_rel=2.3985137939453125, norm_rel=0.024491269141435623, ref_abs_avg=41.556983947753906, test_abs_avg=41.60999298095703
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.30424165725708, max_abs=9.0, mean_rel=0.16323739290237427, max_rel=1172.9056396484375, norm_rel=0.02505185827612877, ref_abs_avg=52.361183166503906, test_abs_avg=52.36187744140625
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2032372951507568, max_abs=8.0, mean_rel=0.29608267545700073, max_rel=3093.749755859375, norm_rel=0.023377930745482445, ref_abs_avg=51.70541000366211, test_abs_avg=51.698143005371094
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9928743839263916, max_abs=4.375, mean_rel=0.3047163784503937, max_rel=79.77870178222656, norm_rel=0.025466157123446465, ref_abs_avg=39.22953796386719, test_abs_avg=39.21284484863281
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2047227621078491, max_abs=9.0, mean_rel=0.17302045226097107, max_rel=1650.55126953125, norm_rel=0.024900825694203377, ref_abs_avg=48.67662048339844, test_abs_avg=48.675758361816406
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1169366836547852, max_abs=6.5, mean_rel=0.2889445722103119, max_rel=4375.0, norm_rel=0.023200461640954018, ref_abs_avg=48.36091995239258, test_abs_avg=48.36137771606445
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8735752105712891, max_abs=3.9140625, mean_rel=0.1139916330575943, max_rel=5.0460968017578125, norm_rel=0.022533312439918518, ref_abs_avg=38.114097595214844, test_abs_avg=38.05638122558594
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1264564990997314, max_abs=9.0, mean_rel=0.16648584604263306, max_rel=2050.37548828125, norm_rel=0.02472149394452572, ref_abs_avg=45.83657455444336, test_abs_avg=45.833553314208984
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0412548780441284, max_abs=7.53125, mean_rel=0.32158076763153076, max_rel=3062.499755859375, norm_rel=0.02312624640762806, ref_abs_avg=45.26287841796875, test_abs_avg=45.2562141418457
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8410348892211914, max_abs=3.5, mean_rel=0.0847456157207489, max_rel=5.37252950668335, norm_rel=0.023151742294430733, ref_abs_avg=35.86454391479492, test_abs_avg=35.804622650146484
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.056572437286377, max_abs=8.0, mean_rel=0.16605931520462036, max_rel=1463.005859375, norm_rel=0.02442634478211403, ref_abs_avg=43.50834274291992, test_abs_avg=43.50664520263672
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9791502952575684, max_abs=6.25, mean_rel=0.27484816312789917, max_rel=3406.249755859375, norm_rel=0.022887075319886208, ref_abs_avg=43.004150390625, test_abs_avg=43.003379821777344
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7979986667633057, max_abs=3.5, mean_rel=0.10070031881332397, max_rel=8.237847328186035, norm_rel=0.023828141391277313, ref_abs_avg=32.862335205078125, test_abs_avg=32.852333068847656
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9996931552886963, max_abs=8.0, mean_rel=0.16592463850975037, max_rel=1801.1212158203125, norm_rel=0.024351900443434715, ref_abs_avg=41.29765319824219, test_abs_avg=41.29561233520508
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9239731431007385, max_abs=6.0, mean_rel=0.32374686002731323, max_rel=2718.749755859375, norm_rel=0.022881215438246727, ref_abs_avg=40.547096252441406, test_abs_avg=40.547515869140625
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7528696060180664, max_abs=3.25, mean_rel=0.12400084733963013, max_rel=12.499290466308594, norm_rel=0.022493984550237656, ref_abs_avg=33.0484733581543, test_abs_avg=33.04954528808594
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9561485648155212, max_abs=8.0, mean_rel=0.14797304570674896, max_rel=916.456298828125, norm_rel=0.024139918386936188, ref_abs_avg=39.84115219116211, test_abs_avg=39.840633392333984
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8791458606719971, max_abs=5.0, mean_rel=0.32476282119750977, max_rel=3374.999755859375, norm_rel=0.02255292981863022, ref_abs_avg=39.149497985839844, test_abs_avg=39.14852523803711
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8355598449707031, max_abs=4.25, mean_rel=0.09420359879732132, max_rel=7.174096584320068, norm_rel=0.025630539283156395, ref_abs_avg=33.42749786376953, test_abs_avg=33.379905700683594
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1015335321426392, max_abs=8.0, mean_rel=0.18143635988235474, max_rel=1336.2916259765625, norm_rel=0.02625160850584507, ref_abs_avg=42.20874786376953, test_abs_avg=42.20965576171875
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0295522212982178, max_abs=6.75, mean_rel=0.286387175321579, max_rel=2812.499755859375, norm_rel=0.024686556309461594, ref_abs_avg=41.9300537109375, test_abs_avg=41.94374465942383
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8326315879821777, max_abs=3.25, mean_rel=0.08097422868013382, max_rel=5.4997053146362305, norm_rel=0.025749213993549347, ref_abs_avg=32.34141540527344, test_abs_avg=32.353519439697266
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0242772102355957, max_abs=6.75, mean_rel=0.1763264387845993, max_rel=1446.0560302734375, norm_rel=0.026475461199879646, ref_abs_avg=38.836814880371094, test_abs_avg=38.83392333984375
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9538978338241577, max_abs=6.125, mean_rel=0.31096410751342773, max_rel=3218.749755859375, norm_rel=0.02502644620835781, ref_abs_avg=38.275840759277344, test_abs_avg=38.285400390625
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7329673767089844, max_abs=2.859375, mean_rel=0.09824249148368835, max_rel=4.735316753387451, norm_rel=0.025543374940752983, ref_abs_avg=29.622373580932617, test_abs_avg=29.598913192749023
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9433766603469849, max_abs=6.4375, mean_rel=0.17053638398647308, max_rel=1134.0361328125, norm_rel=0.02629934437572956, ref_abs_avg=36.02243423461914, test_abs_avg=36.02362060546875
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.887096643447876, max_abs=6.0, mean_rel=0.2744060158729553, max_rel=3312.499755859375, norm_rel=0.02511235885322094, ref_abs_avg=35.48280334472656, test_abs_avg=35.478919982910156
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7169064283370972, max_abs=3.0, mean_rel=0.13013491034507751, max_rel=23.59946632385254, norm_rel=0.026218123733997345, ref_abs_avg=27.127288818359375, test_abs_avg=26.996562957763672
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8878985047340393, max_abs=6.0, mean_rel=0.1634434163570404, max_rel=1404.5101318359375, norm_rel=0.026217801496386528, ref_abs_avg=33.98979568481445, test_abs_avg=33.98774337768555
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8335530161857605, max_abs=4.875, mean_rel=0.32910478115081787, max_rel=2812.499755859375, norm_rel=0.024703064933419228, ref_abs_avg=33.86103820800781, test_abs_avg=33.860816955566406
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6310977935791016, max_abs=2.625, mean_rel=0.06775137037038803, max_rel=2.7316653728485107, norm_rel=0.02289806678891182, ref_abs_avg=27.46291732788086, test_abs_avg=27.458765029907227
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8368021845817566, max_abs=5.5, mean_rel=0.1706101894378662, max_rel=1097.675537109375, norm_rel=0.025846410542726517, ref_abs_avg=32.48259353637695, test_abs_avg=32.478572845458984
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7829259634017944, max_abs=5.0625, mean_rel=0.3258976936340332, max_rel=2624.999755859375, norm_rel=0.024537909775972366, ref_abs_avg=32.00001525878906, test_abs_avg=32.00111770629883
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.5661172866821289, max_abs=2.5, mean_rel=0.07455487549304962, max_rel=5.351099014282227, norm_rel=0.021368371322751045, ref_abs_avg=27.41006851196289, test_abs_avg=27.37864112854004
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7976507544517517, max_abs=5.5, mean_rel=0.17397381365299225, max_rel=1688.42431640625, norm_rel=0.025669433176517487, ref_abs_avg=31.19741439819336, test_abs_avg=31.195453643798828
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7410244345664978, max_abs=4.5, mean_rel=0.27904972434043884, max_rel=1781.2498779296875, norm_rel=0.024219810962677002, ref_abs_avg=30.64480209350586, test_abs_avg=30.638996124267578
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5929498672485352, max_abs=2.375, mean_rel=0.28490757942199707, max_rel=62.80250549316406, norm_rel=0.024630753323435783, ref_abs_avg=24.107025146484375, test_abs_avg=24.14829444885254
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7528078556060791, max_abs=6.0, mean_rel=0.16182348132133484, max_rel=929.427001953125, norm_rel=0.025470757856965065, ref_abs_avg=29.65782928466797, test_abs_avg=29.6550235748291
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7069728374481201, max_abs=4.25, mean_rel=0.35245582461357117, max_rel=2109.375, norm_rel=0.023962561041116714, ref_abs_avg=29.573226928710938, test_abs_avg=29.57132911682129
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5660386085510254, max_abs=2.625, mean_rel=0.07670526951551437, max_rel=1.8210992813110352, norm_rel=0.023575076833367348, ref_abs_avg=24.205772399902344, test_abs_avg=24.198490142822266
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7224273681640625, max_abs=5.25, mean_rel=0.16384579241275787, max_rel=880.8001708984375, norm_rel=0.025183554738759995, ref_abs_avg=28.772140502929688, test_abs_avg=28.771699905395508
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6746265888214111, max_abs=4.25, mean_rel=0.26296859979629517, max_rel=2546.875, norm_rel=0.023611973971128464, ref_abs_avg=28.587627410888672, test_abs_avg=28.589866638183594
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6280028820037842, max_abs=2.40625, mean_rel=0.1296500265598297, max_rel=23.693931579589844, norm_rel=0.024820435792207718, ref_abs_avg=25.782583236694336, test_abs_avg=25.79627227783203
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8164107203483582, max_abs=5.625, mean_rel=0.16772769391536713, max_rel=949.3310546875, norm_rel=0.02692118100821972, ref_abs_avg=30.43933868408203, test_abs_avg=30.440200805664062
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7611695528030396, max_abs=4.5, mean_rel=0.26214805245399475, max_rel=1968.7498779296875, norm_rel=0.025244873017072678, ref_abs_avg=30.176952362060547, test_abs_avg=30.181623458862305
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5708761215209961, max_abs=2.375, mean_rel=0.09261749684810638, max_rel=9.581411361694336, norm_rel=0.024909261614084244, ref_abs_avg=23.296977996826172, test_abs_avg=23.29839324951172
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7436025142669678, max_abs=5.0625, mean_rel=0.16387319564819336, max_rel=2081.83349609375, norm_rel=0.026293348520994186, ref_abs_avg=28.34943199157715, test_abs_avg=28.348262786865234
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7010495662689209, max_abs=5.0, mean_rel=0.2863624691963196, max_rel=1687.4998779296875, norm_rel=0.02472265437245369, ref_abs_avg=28.36273956298828, test_abs_avg=28.363140106201172
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5639548301696777, max_abs=2.5, mean_rel=0.09979094564914703, max_rel=3.747079610824585, norm_rel=0.02487734705209732, ref_abs_avg=22.581302642822266, test_abs_avg=22.592084884643555
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.695006787776947, max_abs=4.5, mean_rel=0.16765430569648743, max_rel=885.4375610351562, norm_rel=0.02580755390226841, ref_abs_avg=26.99629020690918, test_abs_avg=26.993555068969727
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6508080959320068, max_abs=4.25, mean_rel=0.2527007758617401, max_rel=2531.25, norm_rel=0.024445755407214165, ref_abs_avg=26.683412551879883, test_abs_avg=26.6807918548584
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5050945281982422, max_abs=2.25, mean_rel=0.09929652512073517, max_rel=10.31689453125, norm_rel=0.025105254724621773, ref_abs_avg=21.184329986572266, test_abs_avg=21.20772361755371
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6547250747680664, max_abs=4.5, mean_rel=0.1708495020866394, max_rel=1381.3270263671875, norm_rel=0.025362243875861168, ref_abs_avg=25.83932876586914, test_abs_avg=25.83824920654297
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6059365272521973, max_abs=4.125, mean_rel=0.2241274118423462, max_rel=1999.9998779296875, norm_rel=0.023839324712753296, ref_abs_avg=25.41421890258789, test_abs_avg=25.412263870239258
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4809713363647461, max_abs=1.625, mean_rel=0.13568532466888428, max_rel=18.588165283203125, norm_rel=0.024061251431703568, ref_abs_avg=19.715866088867188, test_abs_avg=19.72578239440918
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.618809700012207, max_abs=4.25, mean_rel=0.1593323051929474, max_rel=1240.1759033203125, norm_rel=0.02514956332743168, ref_abs_avg=24.613109588623047, test_abs_avg=24.613361358642578
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5772205591201782, max_abs=4.5, mean_rel=0.23949559032917023, max_rel=1499.9998779296875, norm_rel=0.023590482771396637, ref_abs_avg=24.48543930053711, test_abs_avg=24.484739303588867
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4611077308654785, max_abs=1.625, mean_rel=0.08783397823572159, max_rel=7.40911340713501, norm_rel=0.023083506152033806, ref_abs_avg=19.775279998779297, test_abs_avg=19.779964447021484
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.587613582611084, max_abs=4.5, mean_rel=0.1548742949962616, max_rel=1250.8551025390625, norm_rel=0.024875003844499588, ref_abs_avg=23.66252899169922, test_abs_avg=23.66179656982422
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5467787981033325, max_abs=4.75, mean_rel=0.23311686515808105, max_rel=1874.9998779296875, norm_rel=0.023258822038769722, ref_abs_avg=23.53946304321289, test_abs_avg=23.53929901123047
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4503132700920105, max_abs=1.75, mean_rel=0.13850603997707367, max_rel=23.633956909179688, norm_rel=0.024064529687166214, ref_abs_avg=18.64753532409668, test_abs_avg=18.63455581665039
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5597147941589355, max_abs=5.0, mean_rel=0.150559663772583, max_rel=1013.172119140625, norm_rel=0.02431386709213257, ref_abs_avg=23.048900604248047, test_abs_avg=23.048328399658203
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5188309550285339, max_abs=3.5, mean_rel=0.21053171157836914, max_rel=1406.2498779296875, norm_rel=0.022961212322115898, ref_abs_avg=22.588912963867188, test_abs_avg=22.586759567260742
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4129371643066406, max_abs=1.875, mean_rel=0.07432132959365845, max_rel=3.9844768047332764, norm_rel=0.023417366668581963, ref_abs_avg=18.074438095092773, test_abs_avg=18.057119369506836
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5341675281524658, max_abs=4.25, mean_rel=0.14828403294086456, max_rel=920.427978515625, norm_rel=0.02402660809457302, ref_abs_avg=22.27348518371582, test_abs_avg=22.272899627685547
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4882630705833435, max_abs=4.0, mean_rel=0.19513151049613953, max_rel=1218.75, norm_rel=0.02194344438612461, ref_abs_avg=22.204076766967773, test_abs_avg=22.210145950317383
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4852387309074402, max_abs=1.9375, mean_rel=0.19588756561279297, max_rel=51.8071174621582, norm_rel=0.023457489907741547, ref_abs_avg=20.882671356201172, test_abs_avg=20.890485763549805
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6267095804214478, max_abs=5.0, mean_rel=0.16994819045066833, max_rel=1437.2637939453125, norm_rel=0.02541409432888031, ref_abs_avg=24.687454223632812, test_abs_avg=24.687484741210938
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.577668309211731, max_abs=4.5, mean_rel=0.20852872729301453, max_rel=1718.7498779296875, norm_rel=0.02385464496910572, ref_abs_avg=24.217041015625, test_abs_avg=24.21783447265625
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4673604965209961, max_abs=1.75, mean_rel=0.09208039939403534, max_rel=5.697253227233887, norm_rel=0.024996528401970863, ref_abs_avg=18.544273376464844, test_abs_avg=18.551097869873047
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5639888048171997, max_abs=5.0, mean_rel=0.1704118847846985, max_rel=1218.4110107421875, norm_rel=0.02491433173418045, ref_abs_avg=22.667020797729492, test_abs_avg=22.666244506835938
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5258141756057739, max_abs=3.75, mean_rel=0.2162589728832245, max_rel=1437.4998779296875, norm_rel=0.023546064272522926, ref_abs_avg=22.364585876464844, test_abs_avg=22.363975524902344
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.40169334411621094, max_abs=1.625, mean_rel=0.07537885010242462, max_rel=3.159872531890869, norm_rel=0.02348833717405796, ref_abs_avg=17.12445640563965, test_abs_avg=17.109935760498047
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5136577486991882, max_abs=4.0, mean_rel=0.14880573749542236, max_rel=663.8905029296875, norm_rel=0.02407011389732361, ref_abs_avg=21.407182693481445, test_abs_avg=21.40765380859375
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.47886234521865845, max_abs=4.40625, mean_rel=0.2103528082370758, max_rel=1874.9998779296875, norm_rel=0.02229667827486992, ref_abs_avg=21.556543350219727, test_abs_avg=21.552968978881836
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3927128314971924, max_abs=1.5, mean_rel=0.12535898387432098, max_rel=10.598100662231445, norm_rel=0.022669175639748573, ref_abs_avg=17.493812561035156, test_abs_avg=17.525705337524414
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4889599680900574, max_abs=5.0, mean_rel=0.1443493664264679, max_rel=1303.483154296875, norm_rel=0.023268593475222588, ref_abs_avg=21.098339080810547, test_abs_avg=21.09843635559082
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.43876639008522034, max_abs=4.375, mean_rel=0.18953359127044678, max_rel=1531.2498779296875, norm_rel=0.021363098174333572, ref_abs_avg=20.573192596435547, test_abs_avg=20.567279815673828
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.343533992767334, max_abs=1.375, mean_rel=0.13604967296123505, max_rel=32.8203010559082, norm_rel=0.02061358466744423, ref_abs_avg=16.941539764404297, test_abs_avg=16.94184112548828
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4436059594154358, max_abs=5.0, mean_rel=0.1426146924495697, max_rel=1567.531005859375, norm_rel=0.02245623990893364, ref_abs_avg=19.849323272705078, test_abs_avg=19.848670959472656
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.41079914569854736, max_abs=5.1875, mean_rel=0.20556685328483582, max_rel=1937.4998779296875, norm_rel=0.020965157076716423, ref_abs_avg=19.691579818725586, test_abs_avg=19.68905258178711
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.30028462409973145, max_abs=1.25, mean_rel=0.054793309420347214, max_rel=1.7793079614639282, norm_rel=0.01826045848429203, ref_abs_avg=17.040508270263672, test_abs_avg=17.036357879638672
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.42221134901046753, max_abs=5.0, mean_rel=0.13307002186775208, max_rel=718.8596801757812, norm_rel=0.02198309823870659, ref_abs_avg=19.377521514892578, test_abs_avg=19.376407623291016
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3814725875854492, max_abs=4.375, mean_rel=0.16637936234474182, max_rel=1468.7498779296875, norm_rel=0.0202969778329134, ref_abs_avg=19.019676208496094, test_abs_avg=19.015609741210938
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3181875944137573, max_abs=1.375, mean_rel=0.08601387590169907, max_rel=4.251872539520264, norm_rel=0.020210841670632362, ref_abs_avg=15.921136856079102, test_abs_avg=15.911840438842773
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.39684098958969116, max_abs=4.5, mean_rel=0.12696939706802368, max_rel=979.0119018554688, norm_rel=0.021405072882771492, ref_abs_avg=18.797901153564453, test_abs_avg=18.798337936401367
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3549065589904785, max_abs=4.25, mean_rel=0.16821877658367157, max_rel=1398.4373779296875, norm_rel=0.018987877294421196, ref_abs_avg=18.880508422851562, test_abs_avg=18.887388229370117
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.29875874519348145, max_abs=1.1875, mean_rel=0.06968417018651962, max_rel=4.607539176940918, norm_rel=0.019999636337161064, ref_abs_avg=14.863719940185547, test_abs_avg=14.862899780273438
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3616025447845459, max_abs=4.5, mean_rel=0.12231563031673431, max_rel=822.1748046875, norm_rel=0.020585976541042328, ref_abs_avg=17.853126525878906, test_abs_avg=17.853853225708008
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.335114449262619, max_abs=4.5, mean_rel=0.17101836204528809, max_rel=1515.6248779296875, norm_rel=0.019234325736761093, ref_abs_avg=17.833831787109375, test_abs_avg=17.843687057495117
production_forward2 vs paper_forward output: mean_abs=0.0016212068730965257, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008344028145074844, max_abs=0.4375, mean_rel=0.07268735766410828, max_rel=98.69426727294922, norm_rel=0.020024564117193222, ref_abs_avg=0.4541972875595093, test_abs_avg=0.45420145988464355
production_forward2 grad[1] vs paper_forward: mean_abs=7.329097747802734, max_abs=54.0, mean_rel=0.14879325032234192, max_rel=212.13546752929688, norm_rel=0.020389627665281296, ref_abs_avg=315.09918212890625, test_abs_avg=315.12353515625
production_forward2 grad[2] vs paper_forward: mean_abs=1.283975601196289, max_abs=5.0, mean_rel=0.10234296321868896, max_rel=8.436094284057617, norm_rel=0.023922128602862358, ref_abs_avg=53.519554138183594, test_abs_avg=53.55940628051758
production_forward2 grad[3] vs paper_forward: mean_abs=1.5917468070983887, max_abs=11.0, mean_rel=0.17731627821922302, max_rel=3551.38671875, norm_rel=0.024754267185926437, ref_abs_avg=64.72822570800781, test_abs_avg=64.72511291503906
production_forward2 grad[4] vs paper_forward: mean_abs=1.4666087627410889, max_abs=9.0, mean_rel=0.38466107845306396, max_rel=3874.999755859375, norm_rel=0.022946391254663467, ref_abs_avg=64.17269897460938, test_abs_avg=64.16661071777344
production_forward2 grad[5] vs paper_forward: mean_abs=1.1690492630004883, max_abs=5.5, mean_rel=0.14244158565998077, max_rel=22.54753875732422, norm_rel=0.02608860842883587, ref_abs_avg=44.479793548583984, test_abs_avg=44.50702667236328
production_forward2 grad[6] vs paper_forward: mean_abs=1.3975131511688232, max_abs=9.0, mean_rel=0.16382145881652832, max_rel=1981.9376220703125, norm_rel=0.024466564878821373, ref_abs_avg=57.48408508300781, test_abs_avg=57.48663330078125
production_forward2 grad[7] vs paper_forward: mean_abs=1.2864971160888672, max_abs=7.375, mean_rel=0.31657958030700684, max_rel=4187.5, norm_rel=0.02281818352639675, ref_abs_avg=56.65764617919922, test_abs_avg=56.6588134765625
production_forward2 grad[8] vs paper_forward: mean_abs=1.003499984741211, max_abs=4.5, mean_rel=0.07611415535211563, max_rel=3.493788957595825, norm_rel=0.02377384901046753, ref_abs_avg=41.556983947753906, test_abs_avg=41.61647033691406
production_forward2 grad[9] vs paper_forward: mean_abs=1.2653226852416992, max_abs=8.0, mean_rel=0.15823781490325928, max_rel=1272.6239013671875, norm_rel=0.024324363097548485, ref_abs_avg=52.361183166503906, test_abs_avg=52.362945556640625
production_forward2 grad[10] vs paper_forward: mean_abs=1.1612536907196045, max_abs=7.28125, mean_rel=0.29891204833984375, max_rel=3124.999755859375, norm_rel=0.022574149072170258, ref_abs_avg=51.70541000366211, test_abs_avg=51.69804382324219
production_forward2 grad[11] vs paper_forward: mean_abs=0.9274752140045166, max_abs=3.625, mean_rel=0.3208664655685425, max_rel=92.37419128417969, norm_rel=0.024142101407051086, ref_abs_avg=39.22953796386719, test_abs_avg=39.2235107421875
production_forward2 grad[12] vs paper_forward: mean_abs=1.1721596717834473, max_abs=8.0, mean_rel=0.15834304690361023, max_rel=1034.838623046875, norm_rel=0.024240171536803246, ref_abs_avg=48.67662048339844, test_abs_avg=48.67596435546875
production_forward2 grad[13] vs paper_forward: mean_abs=1.0828213691711426, max_abs=6.75, mean_rel=0.2590213418006897, max_rel=3062.499755859375, norm_rel=0.02250564470887184, ref_abs_avg=48.36091995239258, test_abs_avg=48.36450958251953
production_forward2 grad[14] vs paper_forward: mean_abs=0.8487882614135742, max_abs=4.0, mean_rel=0.09867718815803528, max_rel=4.948365211486816, norm_rel=0.02183820866048336, ref_abs_avg=38.114097595214844, test_abs_avg=38.038238525390625
production_forward2 grad[15] vs paper_forward: mean_abs=1.096403956413269, max_abs=8.0, mean_rel=0.1589011549949646, max_rel=2371.199462890625, norm_rel=0.024084191769361496, ref_abs_avg=45.83657455444336, test_abs_avg=45.83586120605469
production_forward2 grad[16] vs paper_forward: mean_abs=1.0107853412628174, max_abs=6.25, mean_rel=0.30688029527664185, max_rel=3312.499755859375, norm_rel=0.022458083927631378, ref_abs_avg=45.26287841796875, test_abs_avg=45.254486083984375
production_forward2 grad[17] vs paper_forward: mean_abs=0.8099145889282227, max_abs=3.5, mean_rel=0.0954168289899826, max_rel=11.202981948852539, norm_rel=0.02254973165690899, ref_abs_avg=35.86454391479492, test_abs_avg=35.813926696777344
production_forward2 grad[18] vs paper_forward: mean_abs=1.0316177606582642, max_abs=6.5, mean_rel=0.1594933569431305, max_rel=2233.74560546875, norm_rel=0.023851510137319565, ref_abs_avg=43.50834274291992, test_abs_avg=43.50830078125
production_forward2 grad[19] vs paper_forward: mean_abs=0.9529284238815308, max_abs=6.0, mean_rel=0.2591550350189209, max_rel=2874.999755859375, norm_rel=0.02226700447499752, ref_abs_avg=43.004150390625, test_abs_avg=43.006683349609375
production_forward2 grad[20] vs paper_forward: mean_abs=0.7961905002593994, max_abs=3.0, mean_rel=0.14529083669185638, max_rel=27.723936080932617, norm_rel=0.023296896368265152, ref_abs_avg=32.862335205078125, test_abs_avg=32.86444854736328
production_forward2 grad[21] vs paper_forward: mean_abs=0.9750512838363647, max_abs=7.0, mean_rel=0.16184721887111664, max_rel=1223.3934326171875, norm_rel=0.023769458755850792, ref_abs_avg=41.29765319824219, test_abs_avg=41.296546936035156
production_forward2 grad[22] vs paper_forward: mean_abs=0.8994529247283936, max_abs=5.75, mean_rel=0.33935630321502686, max_rel=2937.499755859375, norm_rel=0.022267252206802368, ref_abs_avg=40.547096252441406, test_abs_avg=40.55076217651367
production_forward2 grad[23] vs paper_forward: mean_abs=0.7014451026916504, max_abs=2.875, mean_rel=0.10183396190404892, max_rel=8.774178504943848, norm_rel=0.020978856831789017, ref_abs_avg=33.0484733581543, test_abs_avg=33.030799865722656
production_forward2 grad[24] vs paper_forward: mean_abs=0.9339953660964966, max_abs=7.0, mean_rel=0.14536529779434204, max_rel=928.9420776367188, norm_rel=0.023611342534422874, ref_abs_avg=39.84115219116211, test_abs_avg=39.84117889404297
production_forward2 grad[25] vs paper_forward: mean_abs=0.8556268215179443, max_abs=5.0, mean_rel=0.33055374026298523, max_rel=3062.499755859375, norm_rel=0.0219579990953207, ref_abs_avg=39.149497985839844, test_abs_avg=39.148292541503906
production_forward2 grad[26] vs paper_forward: mean_abs=0.7898006439208984, max_abs=3.75, mean_rel=0.07476109266281128, max_rel=4.633105278015137, norm_rel=0.023860814049839973, ref_abs_avg=33.42749786376953, test_abs_avg=33.38949203491211
production_forward2 grad[27] vs paper_forward: mean_abs=1.0766994953155518, max_abs=7.0, mean_rel=0.1781996786594391, max_rel=1799.162109375, norm_rel=0.025657901540398598, ref_abs_avg=42.20874786376953, test_abs_avg=42.21043395996094
production_forward2 grad[28] vs paper_forward: mean_abs=1.0021271705627441, max_abs=6.5, mean_rel=0.2868417799472809, max_rel=2406.25, norm_rel=0.02401801571249962, ref_abs_avg=41.9300537109375, test_abs_avg=41.94843292236328
production_forward2 grad[29] vs paper_forward: mean_abs=0.7836956977844238, max_abs=3.25, mean_rel=0.10048027336597443, max_rel=17.214183807373047, norm_rel=0.024264948442578316, ref_abs_avg=32.34141540527344, test_abs_avg=32.355384826660156
production_forward2 grad[30] vs paper_forward: mean_abs=1.0031943321228027, max_abs=7.6953125, mean_rel=0.1734468638896942, max_rel=1309.8624267578125, norm_rel=0.02594907581806183, ref_abs_avg=38.836814880371094, test_abs_avg=38.83595657348633
production_forward2 grad[31] vs paper_forward: mean_abs=0.9341509342193604, max_abs=6.0, mean_rel=0.2903285026550293, max_rel=3609.374755859375, norm_rel=0.024503078311681747, ref_abs_avg=38.275840759277344, test_abs_avg=38.28383255004883
production_forward2 grad[32] vs paper_forward: mean_abs=0.7152862548828125, max_abs=3.5625, mean_rel=0.08715416491031647, max_rel=3.6978282928466797, norm_rel=0.024636603891849518, ref_abs_avg=29.622373580932617, test_abs_avg=29.61930274963379
production_forward2 grad[33] vs paper_forward: mean_abs=0.9253224730491638, max_abs=6.5, mean_rel=0.17275016009807587, max_rel=1377.0108642578125, norm_rel=0.025804489850997925, ref_abs_avg=36.02243423461914, test_abs_avg=36.02223587036133
production_forward2 grad[34] vs paper_forward: mean_abs=0.8677273392677307, max_abs=5.5, mean_rel=0.2750721871852875, max_rel=3124.999755859375, norm_rel=0.024537744000554085, ref_abs_avg=35.48280334472656, test_abs_avg=35.47732925415039
production_forward2 grad[35] vs paper_forward: mean_abs=0.6611151695251465, max_abs=2.75, mean_rel=0.13399966061115265, max_rel=24.639305114746094, norm_rel=0.024600563570857048, ref_abs_avg=27.127288818359375, test_abs_avg=27.00399398803711
production_forward2 grad[36] vs paper_forward: mean_abs=0.8726345300674438, max_abs=7.0, mean_rel=0.16604726016521454, max_rel=1380.81982421875, norm_rel=0.02573946863412857, ref_abs_avg=33.98979568481445, test_abs_avg=33.98751449584961
production_forward2 grad[37] vs paper_forward: mean_abs=0.8155354857444763, max_abs=4.75, mean_rel=0.3240383267402649, max_rel=2937.499755859375, norm_rel=0.02417689561843872, ref_abs_avg=33.86103820800781, test_abs_avg=33.86182403564453
production_forward2 grad[38] vs paper_forward: mean_abs=0.6208219528198242, max_abs=2.125, mean_rel=0.06380072981119156, max_rel=2.360760450363159, norm_rel=0.02271498553454876, ref_abs_avg=27.46291732788086, test_abs_avg=27.451446533203125
production_forward2 grad[39] vs paper_forward: mean_abs=0.8223017454147339, max_abs=5.5, mean_rel=0.17056985199451447, max_rel=1450.9259033203125, norm_rel=0.02540578320622444, ref_abs_avg=32.48259353637695, test_abs_avg=32.4790153503418
production_forward2 grad[40] vs paper_forward: mean_abs=0.7687215209007263, max_abs=5.0, mean_rel=0.30768275260925293, max_rel=2437.5, norm_rel=0.02409197948873043, ref_abs_avg=32.00001525878906, test_abs_avg=32.00061798095703
production_forward2 grad[41] vs paper_forward: mean_abs=0.590721607208252, max_abs=2.25, mean_rel=0.09310197830200195, max_rel=11.022985458374023, norm_rel=0.021916259080171585, ref_abs_avg=27.41006851196289, test_abs_avg=27.3724365234375
production_forward2 grad[42] vs paper_forward: mean_abs=0.7849168181419373, max_abs=5.5, mean_rel=0.1652100533246994, max_rel=1265.0498046875, norm_rel=0.025279782712459564, ref_abs_avg=31.19741439819336, test_abs_avg=31.19563865661621
production_forward2 grad[43] vs paper_forward: mean_abs=0.7274484038352966, max_abs=4.5, mean_rel=0.27322906255722046, max_rel=2031.2498779296875, norm_rel=0.02379138395190239, ref_abs_avg=30.64480209350586, test_abs_avg=30.63745880126953
production_forward2 grad[44] vs paper_forward: mean_abs=0.5976477861404419, max_abs=2.421875, mean_rel=0.3248198330402374, max_rel=102.18528747558594, norm_rel=0.02478887140750885, ref_abs_avg=24.107025146484375, test_abs_avg=24.154850006103516
production_forward2 grad[45] vs paper_forward: mean_abs=0.7413941025733948, max_abs=5.5, mean_rel=0.1536407321691513, max_rel=885.0523681640625, norm_rel=0.02510041743516922, ref_abs_avg=29.65782928466797, test_abs_avg=29.655357360839844
production_forward2 grad[46] vs paper_forward: mean_abs=0.6943999528884888, max_abs=4.25, mean_rel=0.34551379084587097, max_rel=2203.125, norm_rel=0.02354259602725506, ref_abs_avg=29.573226928710938, test_abs_avg=29.571197509765625
production_forward2 grad[47] vs paper_forward: mean_abs=0.5664117336273193, max_abs=2.5, mean_rel=0.08495815098285675, max_rel=3.8985793590545654, norm_rel=0.023611944168806076, ref_abs_avg=24.205772399902344, test_abs_avg=24.227020263671875
production_forward2 grad[48] vs paper_forward: mean_abs=0.7127172946929932, max_abs=5.0, mean_rel=0.16166076064109802, max_rel=1036.3756103515625, norm_rel=0.024851126596331596, ref_abs_avg=28.772140502929688, test_abs_avg=28.771610260009766
production_forward2 grad[49] vs paper_forward: mean_abs=0.6650872230529785, max_abs=4.125, mean_rel=0.26161283254623413, max_rel=2312.5, norm_rel=0.023279838263988495, ref_abs_avg=28.587627410888672, test_abs_avg=28.58979034423828
production_forward2 grad[50] vs paper_forward: mean_abs=0.6094455718994141, max_abs=2.5, mean_rel=0.1482551395893097, max_rel=23.998554229736328, norm_rel=0.024226924404501915, ref_abs_avg=25.782583236694336, test_abs_avg=25.78582191467285
production_forward2 grad[51] vs paper_forward: mean_abs=0.802557110786438, max_abs=5.5, mean_rel=0.16548851132392883, max_rel=1068.1683349609375, norm_rel=0.02645450085401535, ref_abs_avg=30.43933868408203, test_abs_avg=30.441543579101562
production_forward2 grad[52] vs paper_forward: mean_abs=0.74439936876297, max_abs=5.1875, mean_rel=0.24199777841567993, max_rel=2140.625, norm_rel=0.024686431512236595, ref_abs_avg=30.176952362060547, test_abs_avg=30.178354263305664
production_forward2 grad[53] vs paper_forward: mean_abs=0.5533790588378906, max_abs=2.375, mean_rel=0.09177368879318237, max_rel=8.89439582824707, norm_rel=0.023980529978871346, ref_abs_avg=23.296977996826172, test_abs_avg=23.295482635498047
production_forward2 grad[54] vs paper_forward: mean_abs=0.7320895195007324, max_abs=5.0, mean_rel=0.16471588611602783, max_rel=2172.346435546875, norm_rel=0.02588881179690361, ref_abs_avg=28.34943199157715, test_abs_avg=28.349044799804688
production_forward2 grad[55] vs paper_forward: mean_abs=0.6875449419021606, max_abs=5.0, mean_rel=0.2799258232116699, max_rel=2406.25, norm_rel=0.024235257878899574, ref_abs_avg=28.36273956298828, test_abs_avg=28.36337661743164
production_forward2 grad[56] vs paper_forward: mean_abs=0.5368661880493164, max_abs=3.0, mean_rel=0.08284725993871689, max_rel=2.693014144897461, norm_rel=0.024052176624536514, ref_abs_avg=22.581302642822266, test_abs_avg=22.585811614990234
production_forward2 grad[57] vs paper_forward: mean_abs=0.6863005757331848, max_abs=5.5, mean_rel=0.1650591343641281, max_rel=936.3267822265625, norm_rel=0.025476766750216484, ref_abs_avg=26.99629020690918, test_abs_avg=26.9942626953125
production_forward2 grad[58] vs paper_forward: mean_abs=0.6403063535690308, max_abs=4.75, mean_rel=0.241871640086174, max_rel=2062.5, norm_rel=0.024057630449533463, ref_abs_avg=26.683412551879883, test_abs_avg=26.68448257446289
production_forward2 grad[59] vs paper_forward: mean_abs=0.49406909942626953, max_abs=2.5, mean_rel=0.09777121990919113, max_rel=11.692480087280273, norm_rel=0.024616457521915436, ref_abs_avg=21.184329986572266, test_abs_avg=21.197914123535156
production_forward2 grad[60] vs paper_forward: mean_abs=0.646699070930481, max_abs=5.0, mean_rel=0.16606569290161133, max_rel=1493.714111328125, norm_rel=0.02506229653954506, ref_abs_avg=25.83932876586914, test_abs_avg=25.838829040527344
production_forward2 grad[61] vs paper_forward: mean_abs=0.5999762415885925, max_abs=3.75, mean_rel=0.2143220603466034, max_rel=1812.4998779296875, norm_rel=0.0236095879226923, ref_abs_avg=25.41421890258789, test_abs_avg=25.409448623657227
production_forward2 grad[62] vs paper_forward: mean_abs=0.49379491806030273, max_abs=1.75, mean_rel=0.11046101152896881, max_rel=14.225994110107422, norm_rel=0.024280758574604988, ref_abs_avg=19.715866088867188, test_abs_avg=19.73239517211914
production_forward2 grad[63] vs paper_forward: mean_abs=0.6119325160980225, max_abs=5.0, mean_rel=0.15585441887378693, max_rel=551.1781005859375, norm_rel=0.02488272450864315, ref_abs_avg=24.613109588623047, test_abs_avg=24.61219024658203
production_forward2 grad[64] vs paper_forward: mean_abs=0.5706827640533447, max_abs=4.0, mean_rel=0.24530190229415894, max_rel=1999.9998779296875, norm_rel=0.023323632776737213, ref_abs_avg=24.48543930053711, test_abs_avg=24.483020782470703
production_forward2 grad[65] vs paper_forward: mean_abs=0.443511962890625, max_abs=1.59375, mean_rel=0.0815894603729248, max_rel=6.150856018066406, norm_rel=0.022331498563289642, ref_abs_avg=19.775279998779297, test_abs_avg=19.764389038085938
production_forward2 grad[66] vs paper_forward: mean_abs=0.5808446407318115, max_abs=5.0, mean_rel=0.15304473042488098, max_rel=1304.2442626953125, norm_rel=0.02461233362555504, ref_abs_avg=23.66252899169922, test_abs_avg=23.661239624023438
production_forward2 grad[67] vs paper_forward: mean_abs=0.5404738187789917, max_abs=4.5, mean_rel=0.2334795594215393, max_rel=2093.75, norm_rel=0.0229828879237175, ref_abs_avg=23.53946304321289, test_abs_avg=23.53984832763672
production_forward2 grad[68] vs paper_forward: mean_abs=0.47090697288513184, max_abs=2.0, mean_rel=0.14919906854629517, max_rel=24.001502990722656, norm_rel=0.025001712143421173, ref_abs_avg=18.64753532409668, test_abs_avg=18.658716201782227
production_forward2 grad[69] vs paper_forward: mean_abs=0.5542678833007812, max_abs=4.5, mean_rel=0.1477939784526825, max_rel=1376.2266845703125, norm_rel=0.024088401347398758, ref_abs_avg=23.048900604248047, test_abs_avg=23.04827880859375
production_forward2 grad[70] vs paper_forward: mean_abs=0.5122537612915039, max_abs=3.75, mean_rel=0.21480190753936768, max_rel=1218.75, norm_rel=0.022668451070785522, ref_abs_avg=22.588912963867188, test_abs_avg=22.586904525756836
production_forward2 grad[71] vs paper_forward: mean_abs=0.4232645034790039, max_abs=1.75, mean_rel=0.08319108188152313, max_rel=4.810271739959717, norm_rel=0.023714086040854454, ref_abs_avg=18.074438095092773, test_abs_avg=18.057117462158203
production_forward2 grad[72] vs paper_forward: mean_abs=0.529541552066803, max_abs=4.75, mean_rel=0.1476057469844818, max_rel=731.1327514648438, norm_rel=0.023830508813261986, ref_abs_avg=22.27348518371582, test_abs_avg=22.273317337036133
production_forward2 grad[73] vs paper_forward: mean_abs=0.4866744875907898, max_abs=3.75, mean_rel=0.20158979296684265, max_rel=1125.0, norm_rel=0.021880611777305603, ref_abs_avg=22.204076766967773, test_abs_avg=22.209495544433594
production_forward2 grad[74] vs paper_forward: mean_abs=0.4806530475616455, max_abs=2.0, mean_rel=0.25141221284866333, max_rel=56.892478942871094, norm_rel=0.023067310452461243, ref_abs_avg=20.882671356201172, test_abs_avg=20.883350372314453
production_forward2 grad[75] vs paper_forward: mean_abs=0.619985818862915, max_abs=5.0, mean_rel=0.16643047332763672, max_rel=1251.338134765625, norm_rel=0.025148380547761917, ref_abs_avg=24.687454223632812, test_abs_avg=24.68883514404297
production_forward2 grad[76] vs paper_forward: mean_abs=0.5675946474075317, max_abs=4.0, mean_rel=0.1925417184829712, max_rel=1374.9998779296875, norm_rel=0.023450685665011406, ref_abs_avg=24.217041015625, test_abs_avg=24.216707229614258
production_forward2 grad[77] vs paper_forward: mean_abs=0.43830013275146484, max_abs=1.75, mean_rel=0.08465898036956787, max_rel=6.04940128326416, norm_rel=0.02356320433318615, ref_abs_avg=18.544273376464844, test_abs_avg=18.547231674194336
production_forward2 grad[78] vs paper_forward: mean_abs=0.5574196577072144, max_abs=5.0, mean_rel=0.15954901278018951, max_rel=844.4090576171875, norm_rel=0.02464958280324936, ref_abs_avg=22.667020797729492, test_abs_avg=22.667652130126953
production_forward2 grad[79] vs paper_forward: mean_abs=0.5176754593849182, max_abs=4.4375, mean_rel=0.21794091165065765, max_rel=1250.0, norm_rel=0.023167593404650688, ref_abs_avg=22.364585876464844, test_abs_avg=22.362014770507812
production_forward2 grad[80] vs paper_forward: mean_abs=0.40109556913375854, max_abs=1.453125, mean_rel=0.07522393763065338, max_rel=3.330676555633545, norm_rel=0.023268437013030052, ref_abs_avg=17.12445640563965, test_abs_avg=17.092376708984375
production_forward2 grad[81] vs paper_forward: mean_abs=0.5084751844406128, max_abs=4.5, mean_rel=0.1474720537662506, max_rel=656.0960083007812, norm_rel=0.023829558864235878, ref_abs_avg=21.407182693481445, test_abs_avg=21.407411575317383
production_forward2 grad[82] vs paper_forward: mean_abs=0.4732853174209595, max_abs=3.625, mean_rel=0.20171032845973969, max_rel=1874.9998779296875, norm_rel=0.022067831829190254, ref_abs_avg=21.556543350219727, test_abs_avg=21.549219131469727
production_forward2 grad[83] vs paper_forward: mean_abs=0.40023088455200195, max_abs=1.53125, mean_rel=0.12053655833005905, max_rel=7.191678047180176, norm_rel=0.022383876144886017, ref_abs_avg=17.493812561035156, test_abs_avg=17.51602554321289
production_forward2 grad[84] vs paper_forward: mean_abs=0.4832112789154053, max_abs=5.0, mean_rel=0.14869166910648346, max_rel=1519.85400390625, norm_rel=0.02298944815993309, ref_abs_avg=21.098339080810547, test_abs_avg=21.097942352294922
production_forward2 grad[85] vs paper_forward: mean_abs=0.43710213899612427, max_abs=4.0, mean_rel=0.19338952004909515, max_rel=1125.0, norm_rel=0.021296031773090363, ref_abs_avg=20.573192596435547, test_abs_avg=20.567218780517578
production_forward2 grad[86] vs paper_forward: mean_abs=0.3286466598510742, max_abs=1.25, mean_rel=0.15065959095954895, max_rel=37.79346466064453, norm_rel=0.01929325796663761, ref_abs_avg=16.941539764404297, test_abs_avg=16.938655853271484
production_forward2 grad[87] vs paper_forward: mean_abs=0.4400268793106079, max_abs=4.5, mean_rel=0.14150124788284302, max_rel=1495.1728515625, norm_rel=0.022274568676948547, ref_abs_avg=19.849323272705078, test_abs_avg=19.848358154296875
production_forward2 grad[88] vs paper_forward: mean_abs=0.4090317189693451, max_abs=4.625, mean_rel=0.21414411067962646, max_rel=1718.7498779296875, norm_rel=0.020894674584269524, ref_abs_avg=19.691579818725586, test_abs_avg=19.690025329589844
production_forward2 grad[89] vs paper_forward: mean_abs=0.3073234558105469, max_abs=1.34375, mean_rel=0.05645526945590973, max_rel=2.0686264038085938, norm_rel=0.018259115517139435, ref_abs_avg=17.040508270263672, test_abs_avg=17.02959442138672
production_forward2 grad[90] vs paper_forward: mean_abs=0.42027202248573303, max_abs=4.1875, mean_rel=0.13516458868980408, max_rel=974.0303955078125, norm_rel=0.02191118523478508, ref_abs_avg=19.377521514892578, test_abs_avg=19.37621307373047
production_forward2 grad[91] vs paper_forward: mean_abs=0.380395770072937, max_abs=4.375, mean_rel=0.16581544280052185, max_rel=1414.0623779296875, norm_rel=0.020256394520401955, ref_abs_avg=19.019676208496094, test_abs_avg=19.01527976989746
production_forward2 grad[92] vs paper_forward: mean_abs=0.30869436264038086, max_abs=1.375, mean_rel=0.08556297421455383, max_rel=4.074711322784424, norm_rel=0.019550500437617302, ref_abs_avg=15.921136856079102, test_abs_avg=15.923690795898438
production_forward2 grad[93] vs paper_forward: mean_abs=0.3951822519302368, max_abs=6.0, mean_rel=0.124341681599617, max_rel=866.0677490234375, norm_rel=0.021324397996068, ref_abs_avg=18.797901153564453, test_abs_avg=18.797626495361328
production_forward2 grad[94] vs paper_forward: mean_abs=0.3503378629684448, max_abs=4.0, mean_rel=0.16133973002433777, max_rel=1296.8748779296875, norm_rel=0.018712414428591728, ref_abs_avg=18.880508422851562, test_abs_avg=18.885881423950195
production_forward2 grad[95] vs paper_forward: mean_abs=0.29520535469055176, max_abs=1.46875, mean_rel=0.07865563035011292, max_rel=7.072845935821533, norm_rel=0.019858378916978836, ref_abs_avg=14.863719940185547, test_abs_avg=14.859872817993164
production_forward2 grad[96] vs paper_forward: mean_abs=0.36113226413726807, max_abs=4.0, mean_rel=0.12277369946241379, max_rel=780.3638305664062, norm_rel=0.020556148141622543, ref_abs_avg=17.853126525878906, test_abs_avg=17.85384178161621
production_forward2 grad[97] vs paper_forward: mean_abs=0.3320552110671997, max_abs=4.5, mean_rel=0.16923286020755768, max_rel=1515.6248779296875, norm_rel=0.019071657210588455, ref_abs_avg=17.833831787109375, test_abs_avg=17.842727661132812
identity layers + randn queries
production_forward2 fwd+bwd:  113.679 ms
production_forward2 bwd-only: 96.110 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.322 GiB, fwd+bwd=10.322 GiB
paper_forward fwd+bwd:  381.647 ms
paper_forward bwd-only: 301.476 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.742 GiB, fwd+bwd=32.492 GiB
production_forward fwd+bwd:  114.390 ms
production_forward bwd-only: 95.988 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.322 GiB, fwd+bwd=10.322 GiB
torch_compile_phases_forward fwd+bwd:  167.059 ms
torch_compile_phases_forward bwd-only: 132.796 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.001635188004001975, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008254118263721466, max_abs=0.46875, mean_rel=0.07142855226993561, max_rel=84.25071716308594, norm_rel=0.019498102366924286, ref_abs_avg=0.4590655565261841, test_abs_avg=0.45908981561660767
production_forward grad[1] vs paper_forward: mean_abs=7.217409133911133, max_abs=64.0, mean_rel=0.18814757466316223, max_rel=283.8377685546875, norm_rel=0.02031250111758709, ref_abs_avg=316.07989501953125, test_abs_avg=316.1028137207031
production_forward grad[2] vs paper_forward: mean_abs=1.1672649383544922, max_abs=5.3125, mean_rel=0.0746937021613121, max_rel=3.2680954933166504, norm_rel=0.020889796316623688, ref_abs_avg=57.13299560546875, test_abs_avg=57.19126892089844
production_forward grad[3] vs paper_forward: mean_abs=1.5837829113006592, max_abs=10.0, mean_rel=0.16770640015602112, max_rel=2111.2255859375, norm_rel=0.02411481738090515, ref_abs_avg=66.07260131835938, test_abs_avg=66.08270263671875
production_forward grad[4] vs paper_forward: mean_abs=1.4548218250274658, max_abs=9.5, mean_rel=0.34008872509002686, max_rel=4125.0, norm_rel=0.0223686583340168, ref_abs_avg=65.448486328125, test_abs_avg=65.45756530761719
production_forward grad[5] vs paper_forward: mean_abs=1.118863582611084, max_abs=4.5, mean_rel=0.08746054768562317, max_rel=8.044106483459473, norm_rel=0.02396669238805771, ref_abs_avg=46.52621078491211, test_abs_avg=46.55138397216797
production_forward grad[6] vs paper_forward: mean_abs=1.383571982383728, max_abs=10.0, mean_rel=0.16262111067771912, max_rel=1926.5233154296875, norm_rel=0.023887598887085915, ref_abs_avg=58.34029006958008, test_abs_avg=58.345550537109375
production_forward grad[7] vs paper_forward: mean_abs=1.2759292125701904, max_abs=8.0, mean_rel=0.31346040964126587, max_rel=3937.499755859375, norm_rel=0.02234189584851265, ref_abs_avg=57.45882034301758, test_abs_avg=57.47107696533203
production_forward grad[8] vs paper_forward: mean_abs=0.9550971984863281, max_abs=5.5, mean_rel=0.11796113103628159, max_rel=11.825103759765625, norm_rel=0.022005051374435425, ref_abs_avg=44.943172454833984, test_abs_avg=44.959716796875
production_forward grad[9] vs paper_forward: mean_abs=1.2590186595916748, max_abs=9.0, mean_rel=0.16624762117862701, max_rel=2111.919189453125, norm_rel=0.023699238896369934, ref_abs_avg=53.49320983886719, test_abs_avg=53.49610137939453
production_forward grad[10] vs paper_forward: mean_abs=1.1612091064453125, max_abs=7.375, mean_rel=0.3279750943183899, max_rel=3515.624755859375, norm_rel=0.022009115666151047, ref_abs_avg=53.098548889160156, test_abs_avg=53.098777770996094
production_forward grad[11] vs paper_forward: mean_abs=0.9417877197265625, max_abs=3.296875, mean_rel=0.08208068460226059, max_rel=3.0862491130828857, norm_rel=0.02362288534641266, ref_abs_avg=39.278656005859375, test_abs_avg=39.241546630859375
production_forward grad[12] vs paper_forward: mean_abs=1.1699153184890747, max_abs=8.0, mean_rel=0.17578446865081787, max_rel=3663.533447265625, norm_rel=0.02357078716158867, ref_abs_avg=49.94597625732422, test_abs_avg=49.94803237915039
production_forward grad[13] vs paper_forward: mean_abs=1.0716944932937622, max_abs=6.375, mean_rel=0.3236766457557678, max_rel=2874.999755859375, norm_rel=0.021999657154083252, ref_abs_avg=49.02234649658203, test_abs_avg=49.02690887451172
production_forward grad[14] vs paper_forward: mean_abs=0.848389208316803, max_abs=4.3125, mean_rel=0.11404335498809814, max_rel=7.131842613220215, norm_rel=0.02339944802224636, ref_abs_avg=36.175323486328125, test_abs_avg=36.25230026245117
production_forward grad[15] vs paper_forward: mean_abs=1.083972454071045, max_abs=8.0, mean_rel=0.15634551644325256, max_rel=1292.647216796875, norm_rel=0.023381255567073822, ref_abs_avg=46.652191162109375, test_abs_avg=46.65874481201172
production_forward grad[16] vs paper_forward: mean_abs=1.000288724899292, max_abs=6.5, mean_rel=0.3208128809928894, max_rel=2624.999755859375, norm_rel=0.021964136511087418, ref_abs_avg=45.753623962402344, test_abs_avg=45.762542724609375
production_forward grad[17] vs paper_forward: mean_abs=0.7969808578491211, max_abs=2.9375, mean_rel=0.08091077208518982, max_rel=10.360486030578613, norm_rel=0.021509984508156776, ref_abs_avg=37.04426956176758, test_abs_avg=37.05257034301758
production_forward grad[18] vs paper_forward: mean_abs=1.0204840898513794, max_abs=7.0, mean_rel=0.14649748802185059, max_rel=1139.302490234375, norm_rel=0.023316167294979095, ref_abs_avg=44.05025863647461, test_abs_avg=44.05572509765625
production_forward grad[19] vs paper_forward: mean_abs=0.9414020776748657, max_abs=5.5, mean_rel=0.3527902364730835, max_rel=2874.999755859375, norm_rel=0.021573977544903755, ref_abs_avg=43.8243408203125, test_abs_avg=43.82910919189453
production_forward grad[20] vs paper_forward: mean_abs=0.7397775650024414, max_abs=3.5, mean_rel=0.09782842546701431, max_rel=12.603985786437988, norm_rel=0.02188926190137863, ref_abs_avg=34.004905700683594, test_abs_avg=34.03693389892578
production_forward grad[21] vs paper_forward: mean_abs=0.9818328022956848, max_abs=8.0, mean_rel=0.15291079878807068, max_rel=1891.8197021484375, norm_rel=0.023323388770222664, ref_abs_avg=42.34927749633789, test_abs_avg=42.35818862915039
production_forward grad[22] vs paper_forward: mean_abs=0.8967399597167969, max_abs=5.5, mean_rel=0.27219539880752563, max_rel=2749.999755859375, norm_rel=0.02159642055630684, ref_abs_avg=41.735679626464844, test_abs_avg=41.74177551269531
production_forward grad[23] vs paper_forward: mean_abs=0.7713708877563477, max_abs=3.0, mean_rel=0.1173962652683258, max_rel=12.449349403381348, norm_rel=0.024130530655384064, ref_abs_avg=32.46794509887695, test_abs_avg=32.45622253417969
production_forward grad[24] vs paper_forward: mean_abs=0.9284851551055908, max_abs=6.5, mean_rel=0.15146403014659882, max_rel=1787.9537353515625, norm_rel=0.023070499300956726, ref_abs_avg=40.499053955078125, test_abs_avg=40.5016975402832
production_forward grad[25] vs paper_forward: mean_abs=0.8457838296890259, max_abs=6.0, mean_rel=0.36818942427635193, max_rel=2999.999755859375, norm_rel=0.021267099305987358, ref_abs_avg=39.9589729309082, test_abs_avg=39.96659469604492
production_forward grad[26] vs paper_forward: mean_abs=0.8308687210083008, max_abs=3.0, mean_rel=0.09598810970783234, max_rel=16.390926361083984, norm_rel=0.023328300565481186, ref_abs_avg=37.52936553955078, test_abs_avg=37.58188247680664
production_forward grad[27] vs paper_forward: mean_abs=1.0677039623260498, max_abs=8.0, mean_rel=0.16218158602714539, max_rel=899.214599609375, norm_rel=0.02508406713604927, ref_abs_avg=42.823944091796875, test_abs_avg=42.83042907714844
production_forward grad[28] vs paper_forward: mean_abs=0.9947465658187866, max_abs=6.5, mean_rel=0.3523964285850525, max_rel=3343.749755859375, norm_rel=0.023485859856009483, ref_abs_avg=42.562828063964844, test_abs_avg=42.56150817871094
production_forward grad[29] vs paper_forward: mean_abs=0.7505815029144287, max_abs=2.75, mean_rel=0.18028345704078674, max_rel=32.270790100097656, norm_rel=0.02416209504008293, ref_abs_avg=31.24262237548828, test_abs_avg=31.29916000366211
production_forward grad[30] vs paper_forward: mean_abs=0.9800207018852234, max_abs=7.0, mean_rel=0.17342311143875122, max_rel=2083.692138671875, norm_rel=0.025317730382084846, ref_abs_avg=38.870052337646484, test_abs_avg=38.871612548828125
production_forward grad[31] vs paper_forward: mean_abs=0.9153714179992676, max_abs=5.5, mean_rel=0.37045347690582275, max_rel=3796.874755859375, norm_rel=0.02384340763092041, ref_abs_avg=38.52650451660156, test_abs_avg=38.52333450317383
production_forward grad[32] vs paper_forward: mean_abs=0.7366743087768555, max_abs=2.75, mean_rel=0.08936694264411926, max_rel=5.63331937789917, norm_rel=0.024048538878560066, ref_abs_avg=30.558549880981445, test_abs_avg=30.585567474365234
production_forward grad[33] vs paper_forward: mean_abs=0.908899188041687, max_abs=6.0, mean_rel=0.15717002749443054, max_rel=944.84521484375, norm_rel=0.025224991142749786, ref_abs_avg=36.209434509277344, test_abs_avg=36.21437072753906
production_forward grad[34] vs paper_forward: mean_abs=0.8510398864746094, max_abs=5.5625, mean_rel=0.2826564908027649, max_rel=2562.5, norm_rel=0.023872029036283493, ref_abs_avg=35.802024841308594, test_abs_avg=35.81287384033203
production_forward grad[35] vs paper_forward: mean_abs=0.6568779945373535, max_abs=2.25, mean_rel=0.07796497642993927, max_rel=3.078361749649048, norm_rel=0.023872971534729004, ref_abs_avg=27.544532775878906, test_abs_avg=27.553791046142578
production_forward grad[36] vs paper_forward: mean_abs=0.8554261922836304, max_abs=5.75, mean_rel=0.1702873408794403, max_rel=1340.2972412109375, norm_rel=0.0250126663595438, ref_abs_avg=34.343727111816406, test_abs_avg=34.345611572265625
production_forward grad[37] vs paper_forward: mean_abs=0.803168535232544, max_abs=5.25, mean_rel=0.2969411611557007, max_rel=3312.499755859375, norm_rel=0.0235790703445673, ref_abs_avg=34.16360092163086, test_abs_avg=34.17047119140625
production_forward grad[38] vs paper_forward: mean_abs=0.6438932418823242, max_abs=2.5, mean_rel=0.09592531621456146, max_rel=6.235928535461426, norm_rel=0.024706486612558365, ref_abs_avg=25.943302154541016, test_abs_avg=25.919635772705078
production_forward grad[39] vs paper_forward: mean_abs=0.8075098395347595, max_abs=5.0, mean_rel=0.16064441204071045, max_rel=972.1456909179688, norm_rel=0.024792181327939034, ref_abs_avg=32.72142791748047, test_abs_avg=32.725921630859375
production_forward grad[40] vs paper_forward: mean_abs=0.7491952180862427, max_abs=5.0, mean_rel=0.2784050703048706, max_rel=2218.75, norm_rel=0.023236293345689774, ref_abs_avg=32.34596252441406, test_abs_avg=32.34212875366211
production_forward grad[41] vs paper_forward: mean_abs=0.5810902118682861, max_abs=3.0, mean_rel=0.1475619524717331, max_rel=25.170862197875977, norm_rel=0.022226553410291672, ref_abs_avg=26.07061767578125, test_abs_avg=26.06245231628418
production_forward grad[42] vs paper_forward: mean_abs=0.7681218385696411, max_abs=5.0, mean_rel=0.16225433349609375, max_rel=1315.9168701171875, norm_rel=0.024607397615909576, ref_abs_avg=31.270896911621094, test_abs_avg=31.276172637939453
production_forward grad[43] vs paper_forward: mean_abs=0.7156988382339478, max_abs=5.0, mean_rel=0.2552964687347412, max_rel=2937.499755859375, norm_rel=0.023198651149868965, ref_abs_avg=30.984146118164062, test_abs_avg=30.985870361328125
production_forward grad[44] vs paper_forward: mean_abs=0.5561738014221191, max_abs=2.4140625, mean_rel=0.1382761001586914, max_rel=6.388233661651611, norm_rel=0.023643743246793747, ref_abs_avg=23.46734046936035, test_abs_avg=23.47060775756836
production_forward grad[45] vs paper_forward: mean_abs=0.740708589553833, max_abs=5.0, mean_rel=0.1573866307735443, max_rel=1285.208984375, norm_rel=0.02452964149415493, ref_abs_avg=30.314678192138672, test_abs_avg=30.316490173339844
production_forward grad[46] vs paper_forward: mean_abs=0.6869274377822876, max_abs=4.25, mean_rel=0.29695016145706177, max_rel=1671.8748779296875, norm_rel=0.022934500128030777, ref_abs_avg=29.97277069091797, test_abs_avg=29.97228240966797
production_forward grad[47] vs paper_forward: mean_abs=0.5563688278198242, max_abs=2.25, mean_rel=0.39095810055732727, max_rel=99.72444152832031, norm_rel=0.02353522554039955, ref_abs_avg=23.42156982421875, test_abs_avg=23.415143966674805
production_forward grad[48] vs paper_forward: mean_abs=0.7071256637573242, max_abs=5.25, mean_rel=0.16351991891860962, max_rel=1439.33935546875, norm_rel=0.0243070088326931, ref_abs_avg=29.198394775390625, test_abs_avg=29.201374053955078
production_forward grad[49] vs paper_forward: mean_abs=0.6562867164611816, max_abs=4.5, mean_rel=0.23431959748268127, max_rel=2125.0, norm_rel=0.022809701040387154, ref_abs_avg=28.800018310546875, test_abs_avg=28.801715850830078
production_forward grad[50] vs paper_forward: mean_abs=0.58502197265625, max_abs=2.25, mean_rel=0.09862317144870758, max_rel=5.717874050140381, norm_rel=0.02340656891465187, ref_abs_avg=24.86937713623047, test_abs_avg=24.883136749267578
production_forward grad[51] vs paper_forward: mean_abs=0.7704035043716431, max_abs=6.0, mean_rel=0.16734136641025543, max_rel=1122.3126220703125, norm_rel=0.02561551332473755, ref_abs_avg=30.192096710205078, test_abs_avg=30.193941116333008
production_forward grad[52] vs paper_forward: mean_abs=0.7219679355621338, max_abs=5.0, mean_rel=0.2849991023540497, max_rel=2890.624755859375, norm_rel=0.023999620229005814, ref_abs_avg=30.187246322631836, test_abs_avg=30.189804077148438
production_forward grad[53] vs paper_forward: mean_abs=0.5614951848983765, max_abs=2.5, mean_rel=0.08430532366037369, max_rel=5.231665134429932, norm_rel=0.022906072437763214, ref_abs_avg=24.882644653320312, test_abs_avg=24.851852416992188
production_forward grad[54] vs paper_forward: mean_abs=0.7163781523704529, max_abs=5.0, mean_rel=0.15720027685165405, max_rel=1437.2762451171875, norm_rel=0.025012098252773285, ref_abs_avg=28.714839935302734, test_abs_avg=28.71712303161621
production_forward grad[55] vs paper_forward: mean_abs=0.6657389998435974, max_abs=4.125, mean_rel=0.25111880898475647, max_rel=2062.5, norm_rel=0.023351943120360374, ref_abs_avg=28.522306442260742, test_abs_avg=28.527751922607422
production_forward grad[56] vs paper_forward: mean_abs=0.5095744132995605, max_abs=2.25, mean_rel=0.19518205523490906, max_rel=42.859580993652344, norm_rel=0.02295200526714325, ref_abs_avg=23.046619415283203, test_abs_avg=23.07625961303711
production_forward grad[57] vs paper_forward: mean_abs=0.6686404347419739, max_abs=5.0, mean_rel=0.16154447197914124, max_rel=1118.7974853515625, norm_rel=0.024532826617360115, ref_abs_avg=27.266088485717773, test_abs_avg=27.268333435058594
production_forward grad[58] vs paper_forward: mean_abs=0.6194443702697754, max_abs=4.25, mean_rel=0.24027681350708008, max_rel=1937.4998779296875, norm_rel=0.023280205205082893, ref_abs_avg=26.593229293823242, test_abs_avg=26.595840454101562
production_forward grad[59] vs paper_forward: mean_abs=0.5124139785766602, max_abs=1.75, mean_rel=0.0786282867193222, max_rel=5.824002742767334, norm_rel=0.02208160050213337, ref_abs_avg=23.038429260253906, test_abs_avg=23.022817611694336
production_forward grad[60] vs paper_forward: mean_abs=0.6316596865653992, max_abs=5.0, mean_rel=0.14799939095973969, max_rel=550.9259033203125, norm_rel=0.024288209155201912, ref_abs_avg=26.06376838684082, test_abs_avg=26.064678192138672
production_forward grad[61] vs paper_forward: mean_abs=0.5897327661514282, max_abs=4.25, mean_rel=0.25648751854896545, max_rel=3656.249755859375, norm_rel=0.022908655926585197, ref_abs_avg=25.800739288330078, test_abs_avg=25.802719116210938
production_forward grad[62] vs paper_forward: mean_abs=0.43053340911865234, max_abs=1.75, mean_rel=0.06323116272687912, max_rel=4.847390651702881, norm_rel=0.021135419607162476, ref_abs_avg=20.97031021118164, test_abs_avg=20.973730087280273
production_forward grad[63] vs paper_forward: mean_abs=0.5960235595703125, max_abs=5.0, mean_rel=0.1608661413192749, max_rel=872.1532592773438, norm_rel=0.023990042507648468, ref_abs_avg=24.897809982299805, test_abs_avg=24.896869659423828
production_forward grad[64] vs paper_forward: mean_abs=0.5533135533332825, max_abs=4.5, mean_rel=0.2618020474910736, max_rel=1937.4998779296875, norm_rel=0.022328583523631096, ref_abs_avg=24.801372528076172, test_abs_avg=24.806564331054688
production_forward grad[65] vs paper_forward: mean_abs=0.4332294464111328, max_abs=2.0, mean_rel=0.07927653193473816, max_rel=12.46036148071289, norm_rel=0.02075193263590336, ref_abs_avg=21.69683265686035, test_abs_avg=21.70431900024414
production_forward grad[66] vs paper_forward: mean_abs=0.569179356098175, max_abs=4.0, mean_rel=0.15130087733268738, max_rel=1288.48291015625, norm_rel=0.023594452068209648, ref_abs_avg=24.15422821044922, test_abs_avg=24.15627670288086
production_forward grad[67] vs paper_forward: mean_abs=0.5286988615989685, max_abs=3.625, mean_rel=0.23108522593975067, max_rel=1843.7498779296875, norm_rel=0.022611185908317566, ref_abs_avg=23.474653244018555, test_abs_avg=23.475780487060547
production_forward grad[68] vs paper_forward: mean_abs=0.44349074363708496, max_abs=1.8125, mean_rel=0.095668263733387, max_rel=9.489749908447266, norm_rel=0.023359281942248344, ref_abs_avg=19.061321258544922, test_abs_avg=19.058185577392578
production_forward grad[69] vs paper_forward: mean_abs=0.5457509756088257, max_abs=4.5, mean_rel=0.15251626074314117, max_rel=719.14306640625, norm_rel=0.02353360503911972, ref_abs_avg=23.195430755615234, test_abs_avg=23.196674346923828
production_forward grad[70] vs paper_forward: mean_abs=0.5063903331756592, max_abs=4.625, mean_rel=0.22187843918800354, max_rel=1374.9998779296875, norm_rel=0.02202502451837063, ref_abs_avg=22.9887752532959, test_abs_avg=22.988065719604492
production_forward grad[71] vs paper_forward: mean_abs=0.41698265075683594, max_abs=1.625, mean_rel=0.13761641085147858, max_rel=12.611499786376953, norm_rel=0.022660791873931885, ref_abs_avg=18.16246223449707, test_abs_avg=18.219593048095703
production_forward grad[72] vs paper_forward: mean_abs=0.5177488327026367, max_abs=6.0, mean_rel=0.15176326036453247, max_rel=887.2609252929688, norm_rel=0.023031514137983322, ref_abs_avg=22.501861572265625, test_abs_avg=22.50212287902832
production_forward grad[73] vs paper_forward: mean_abs=0.47873640060424805, max_abs=3.25, mean_rel=0.20643994212150574, max_rel=1187.5, norm_rel=0.021650681272149086, ref_abs_avg=22.17121124267578, test_abs_avg=22.171220779418945
production_forward grad[74] vs paper_forward: mean_abs=0.46165725588798523, max_abs=1.75, mean_rel=0.11740326881408691, max_rel=13.495622634887695, norm_rel=0.024846825748682022, ref_abs_avg=18.839786529541016, test_abs_avg=18.849136352539062
production_forward grad[75] vs paper_forward: mean_abs=0.5751758217811584, max_abs=5.5, mean_rel=0.1545742303133011, max_rel=1086.38671875, norm_rel=0.024230562150478363, ref_abs_avg=23.815685272216797, test_abs_avg=23.814083099365234
production_forward grad[76] vs paper_forward: mean_abs=0.5365727543830872, max_abs=4.0, mean_rel=0.22749334573745728, max_rel=1843.7498779296875, norm_rel=0.022541118785738945, ref_abs_avg=23.88721466064453, test_abs_avg=23.893962860107422
production_forward grad[77] vs paper_forward: mean_abs=0.4328773021697998, max_abs=1.5, mean_rel=0.1528884768486023, max_rel=20.161706924438477, norm_rel=0.022647250443696976, ref_abs_avg=19.327667236328125, test_abs_avg=19.316463470458984
production_forward grad[78] vs paper_forward: mean_abs=0.5374292135238647, max_abs=4.5, mean_rel=0.14564166963100433, max_rel=587.7843017578125, norm_rel=0.023675406351685524, ref_abs_avg=22.784852981567383, test_abs_avg=22.784799575805664
production_forward grad[79] vs paper_forward: mean_abs=0.49705395102500916, max_abs=4.0, mean_rel=0.19476643204689026, max_rel=1281.25, norm_rel=0.022006437182426453, ref_abs_avg=22.60968589782715, test_abs_avg=22.614742279052734
production_forward grad[80] vs paper_forward: mean_abs=0.39306962490081787, max_abs=1.2890625, mean_rel=0.6370588541030884, max_rel=245.03663635253906, norm_rel=0.022396383807063103, ref_abs_avg=17.64107322692871, test_abs_avg=17.620223999023438
production_forward grad[81] vs paper_forward: mean_abs=0.5102580785751343, max_abs=5.0, mean_rel=0.1433180868625641, max_rel=734.8797607421875, norm_rel=0.023080525919795036, ref_abs_avg=22.152057647705078, test_abs_avg=22.15187644958496
production_forward grad[82] vs paper_forward: mean_abs=0.46662282943725586, max_abs=4.25, mean_rel=0.20857685804367065, max_rel=1312.4998779296875, norm_rel=0.02166549675166607, ref_abs_avg=21.557828903198242, test_abs_avg=21.553199768066406
production_forward grad[83] vs paper_forward: mean_abs=0.3604913651943207, max_abs=1.25, mean_rel=0.13703829050064087, max_rel=11.354249000549316, norm_rel=0.020438354462385178, ref_abs_avg=17.42448616027832, test_abs_avg=17.41387176513672
production_forward grad[84] vs paper_forward: mean_abs=0.46754103899002075, max_abs=4.5, mean_rel=0.14707350730895996, max_rel=1070.1070556640625, norm_rel=0.022625699639320374, ref_abs_avg=20.756254196166992, test_abs_avg=20.756244659423828
production_forward grad[85] vs paper_forward: mean_abs=0.4394727349281311, max_abs=4.53125, mean_rel=0.20666146278381348, max_rel=1874.9998779296875, norm_rel=0.021003790199756622, ref_abs_avg=20.99995994567871, test_abs_avg=21.007102966308594
production_forward grad[86] vs paper_forward: mean_abs=0.3394787311553955, max_abs=1.34375, mean_rel=0.10791299492120743, max_rel=9.086942672729492, norm_rel=0.019247813150286674, ref_abs_avg=17.790767669677734, test_abs_avg=17.777114868164062
production_forward grad[87] vs paper_forward: mean_abs=0.4470929503440857, max_abs=4.375, mean_rel=0.13723893463611603, max_rel=701.9793701171875, norm_rel=0.022110430523753166, ref_abs_avg=20.353191375732422, test_abs_avg=20.352643966674805
production_forward grad[88] vs paper_forward: mean_abs=0.40752917528152466, max_abs=4.1875, mean_rel=0.18865495920181274, max_rel=1312.4998779296875, norm_rel=0.020790111273527145, ref_abs_avg=19.777435302734375, test_abs_avg=19.784366607666016
production_forward grad[89] vs paper_forward: mean_abs=0.326448917388916, max_abs=1.25, mean_rel=0.14266221225261688, max_rel=7.919996738433838, norm_rel=0.01991857774555683, ref_abs_avg=16.15457534790039, test_abs_avg=16.184955596923828
production_forward grad[90] vs paper_forward: mean_abs=0.4254674017429352, max_abs=6.0, mean_rel=0.1286354511976242, max_rel=702.1995849609375, norm_rel=0.021715456619858742, ref_abs_avg=19.763397216796875, test_abs_avg=19.76457405090332
production_forward grad[91] vs paper_forward: mean_abs=0.3896591067314148, max_abs=4.40625, mean_rel=0.1802660971879959, max_rel=1812.4998779296875, norm_rel=0.020044611766934395, ref_abs_avg=19.591726303100586, test_abs_avg=19.592323303222656
production_forward grad[92] vs paper_forward: mean_abs=0.3078598976135254, max_abs=1.5, mean_rel=0.05147252231836319, max_rel=2.433365821838379, norm_rel=0.01833803951740265, ref_abs_avg=16.913660049438477, test_abs_avg=16.909252166748047
production_forward grad[93] vs paper_forward: mean_abs=0.4032936096191406, max_abs=5.0, mean_rel=0.13354916870594025, max_rel=746.3300170898438, norm_rel=0.021394157782197, ref_abs_avg=19.077430725097656, test_abs_avg=19.077842712402344
production_forward grad[94] vs paper_forward: mean_abs=0.3577747344970703, max_abs=4.0, mean_rel=0.19079332053661346, max_rel=1390.6248779296875, norm_rel=0.019434720277786255, ref_abs_avg=18.60382843017578, test_abs_avg=18.60901641845703
production_forward grad[95] vs paper_forward: mean_abs=0.30394887924194336, max_abs=1.5, mean_rel=0.06942612677812576, max_rel=2.9782164096832275, norm_rel=0.01984385773539543, ref_abs_avg=15.25815486907959, test_abs_avg=15.275117874145508
production_forward grad[96] vs paper_forward: mean_abs=0.3640972971916199, max_abs=4.0, mean_rel=0.1216895580291748, max_rel=464.0430603027344, norm_rel=0.02065253257751465, ref_abs_avg=17.935993194580078, test_abs_avg=17.935558319091797
production_forward grad[97] vs paper_forward: mean_abs=0.34079134464263916, max_abs=3.5, mean_rel=0.1658056378364563, max_rel=1171.875, norm_rel=0.01904706284403801, ref_abs_avg=18.14366912841797, test_abs_avg=18.1508731842041
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016369116492569447, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.00860179029405117, max_abs=0.4375, mean_rel=0.07409583777189255, max_rel=85.39138793945312, norm_rel=0.02019369788467884, ref_abs_avg=0.4590655565261841, test_abs_avg=0.45907747745513916
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.399770736694336, max_abs=66.0, mean_rel=0.19147951900959015, max_rel=267.8183898925781, norm_rel=0.02066880092024803, ref_abs_avg=316.07989501953125, test_abs_avg=316.072998046875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2534523010253906, max_abs=5.75, mean_rel=0.07449179887771606, max_rel=3.1284637451171875, norm_rel=0.022493528202176094, ref_abs_avg=57.13299560546875, test_abs_avg=57.12386703491211
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6349848508834839, max_abs=11.0, mean_rel=0.18018603324890137, max_rel=3324.089111328125, norm_rel=0.024895990267395973, ref_abs_avg=66.07260131835938, test_abs_avg=66.08351135253906
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5032484531402588, max_abs=9.5, mean_rel=0.35094305872917175, max_rel=3187.499755859375, norm_rel=0.02309933304786682, ref_abs_avg=65.448486328125, test_abs_avg=65.45999145507812
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0997123718261719, max_abs=4.25, mean_rel=0.08436781913042068, max_rel=5.287418365478516, norm_rel=0.02393955923616886, ref_abs_avg=46.52621078491211, test_abs_avg=46.48644256591797
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4256224632263184, max_abs=10.0, mean_rel=0.1717785745859146, max_rel=2031.7799072265625, norm_rel=0.024589283391833305, ref_abs_avg=58.34029006958008, test_abs_avg=58.344505310058594
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3239299058914185, max_abs=8.0, mean_rel=0.3231745958328247, max_rel=3812.499755859375, norm_rel=0.023162996396422386, ref_abs_avg=57.45882034301758, test_abs_avg=57.46797180175781
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9665126800537109, max_abs=6.0, mean_rel=0.11971983313560486, max_rel=19.721324920654297, norm_rel=0.022379102185368538, ref_abs_avg=44.943172454833984, test_abs_avg=45.01122283935547
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2960023880004883, max_abs=9.5, mean_rel=0.1687103509902954, max_rel=2602.705810546875, norm_rel=0.02439173497259617, ref_abs_avg=53.49320983886719, test_abs_avg=53.4974365234375
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1962335109710693, max_abs=7.5, mean_rel=0.321435809135437, max_rel=3781.249755859375, norm_rel=0.022679181769490242, ref_abs_avg=53.098548889160156, test_abs_avg=53.10271453857422
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9604625701904297, max_abs=4.0, mean_rel=0.09749758243560791, max_rel=7.888618469238281, norm_rel=0.024487031623721123, ref_abs_avg=39.278656005859375, test_abs_avg=39.25920867919922
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.204725980758667, max_abs=8.0, mean_rel=0.17418172955513, max_rel=3017.06884765625, norm_rel=0.024253619834780693, ref_abs_avg=49.94597625732422, test_abs_avg=49.94739532470703
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1083738803863525, max_abs=7.3125, mean_rel=0.318626344203949, max_rel=2937.499755859375, norm_rel=0.02272297628223896, ref_abs_avg=49.02234649658203, test_abs_avg=49.02566146850586
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8620550632476807, max_abs=3.5, mean_rel=0.1423630267381668, max_rel=17.78533172607422, norm_rel=0.02343718707561493, ref_abs_avg=36.175323486328125, test_abs_avg=36.252777099609375
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1129072904586792, max_abs=8.25, mean_rel=0.16367584466934204, max_rel=1367.6065673828125, norm_rel=0.0239932369440794, ref_abs_avg=46.652191162109375, test_abs_avg=46.654762268066406
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0310685634613037, max_abs=6.0, mean_rel=0.32480576634407043, max_rel=2874.999755859375, norm_rel=0.02259754203259945, ref_abs_avg=45.753623962402344, test_abs_avg=45.76317596435547
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8154525756835938, max_abs=3.4375, mean_rel=0.08579088002443314, max_rel=7.929996967315674, norm_rel=0.02191600203514099, ref_abs_avg=37.04426956176758, test_abs_avg=37.046417236328125
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.046779751777649, max_abs=8.0, mean_rel=0.1512300819158554, max_rel=1398.890625, norm_rel=0.023904874920845032, ref_abs_avg=44.05025863647461, test_abs_avg=44.05258560180664
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9658961296081543, max_abs=6.0, mean_rel=0.3724266290664673, max_rel=2781.249755859375, norm_rel=0.022120101377367973, ref_abs_avg=43.8243408203125, test_abs_avg=43.827964782714844
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7617217898368835, max_abs=3.0, mean_rel=0.09702825546264648, max_rel=7.941767692565918, norm_rel=0.022451162338256836, ref_abs_avg=34.004905700683594, test_abs_avg=34.05855941772461
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0012590885162354, max_abs=8.0, mean_rel=0.15401677787303925, max_rel=1770.2657470703125, norm_rel=0.02377237379550934, ref_abs_avg=42.34927749633789, test_abs_avg=42.35691833496094
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9204645156860352, max_abs=5.5, mean_rel=0.31185460090637207, max_rel=2937.499755859375, norm_rel=0.022165091708302498, ref_abs_avg=41.735679626464844, test_abs_avg=41.74192810058594
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7542362213134766, max_abs=3.25, mean_rel=0.14922872185707092, max_rel=12.95962142944336, norm_rel=0.02336617186665535, ref_abs_avg=32.46794509887695, test_abs_avg=32.44956970214844
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9484843015670776, max_abs=6.5, mean_rel=0.15970999002456665, max_rel=2028.771240234375, norm_rel=0.023570409044623375, ref_abs_avg=40.499053955078125, test_abs_avg=40.50202941894531
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8667015433311462, max_abs=6.0, mean_rel=0.36776024103164673, max_rel=2749.999755859375, norm_rel=0.021791821345686913, ref_abs_avg=39.9589729309082, test_abs_avg=39.962520599365234
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.9123029708862305, max_abs=4.0, mean_rel=0.09667990356683731, max_rel=15.063389778137207, norm_rel=0.02469967119395733, ref_abs_avg=37.52936553955078, test_abs_avg=37.599082946777344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0934720039367676, max_abs=7.5, mean_rel=0.1663876175880432, max_rel=2061.431884765625, norm_rel=0.025680817663669586, ref_abs_avg=42.823944091796875, test_abs_avg=42.83099365234375
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.019291639328003, max_abs=7.5, mean_rel=0.34124991297721863, max_rel=3281.249755859375, norm_rel=0.024054912850260735, ref_abs_avg=42.562828063964844, test_abs_avg=42.55326843261719
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7711455821990967, max_abs=3.0, mean_rel=0.14244963228702545, max_rel=18.74564552307129, norm_rel=0.024753388017416, ref_abs_avg=31.24262237548828, test_abs_avg=31.308902740478516
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9987108707427979, max_abs=6.375, mean_rel=0.17368584871292114, max_rel=1011.6754760742188, norm_rel=0.025797400623559952, ref_abs_avg=38.870052337646484, test_abs_avg=38.87086868286133
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9316962957382202, max_abs=6.0, mean_rel=0.342739999294281, max_rel=3874.999755859375, norm_rel=0.02429291233420372, ref_abs_avg=38.52650451660156, test_abs_avg=38.518096923828125
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.751396656036377, max_abs=2.75, mean_rel=0.10262687504291534, max_rel=3.738696813583374, norm_rel=0.024423403665423393, ref_abs_avg=30.558549880981445, test_abs_avg=30.615589141845703
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9260880947113037, max_abs=6.5, mean_rel=0.1594943106174469, max_rel=953.2427368164062, norm_rel=0.025695135816931725, ref_abs_avg=36.209434509277344, test_abs_avg=36.2120475769043
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8719179630279541, max_abs=5.875, mean_rel=0.27717071771621704, max_rel=2999.999755859375, norm_rel=0.024436451494693756, ref_abs_avg=35.802024841308594, test_abs_avg=35.81330871582031
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6863975524902344, max_abs=2.875, mean_rel=0.13793562352657318, max_rel=30.305191040039062, norm_rel=0.02513636089861393, ref_abs_avg=27.544532775878906, test_abs_avg=27.59625244140625
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.871277391910553, max_abs=6.0, mean_rel=0.166402205824852, max_rel=936.1011962890625, norm_rel=0.025479933246970177, ref_abs_avg=34.343727111816406, test_abs_avg=34.345890045166016
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8171588182449341, max_abs=4.75, mean_rel=0.30181264877319336, max_rel=3312.499755859375, norm_rel=0.02397400513291359, ref_abs_avg=34.16360092163086, test_abs_avg=34.16615295410156
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6598052978515625, max_abs=2.75, mean_rel=0.08980894088745117, max_rel=7.717146396636963, norm_rel=0.025206871330738068, ref_abs_avg=25.943302154541016, test_abs_avg=25.92978286743164
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8202775716781616, max_abs=5.5, mean_rel=0.1607522815465927, max_rel=972.1456909179688, norm_rel=0.025200074538588524, ref_abs_avg=32.72142791748047, test_abs_avg=32.72490310668945
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7653731107711792, max_abs=5.0, mean_rel=0.2974589169025421, max_rel=2500.0, norm_rel=0.02372068539261818, ref_abs_avg=32.34596252441406, test_abs_avg=32.3436279296875
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6125237941741943, max_abs=2.5, mean_rel=0.11447888612747192, max_rel=11.451580047607422, norm_rel=0.023397227749228477, ref_abs_avg=26.07061767578125, test_abs_avg=26.056106567382812
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7791919708251953, max_abs=5.25, mean_rel=0.1646215319633484, max_rel=1069.5557861328125, norm_rel=0.02496306225657463, ref_abs_avg=31.270896911621094, test_abs_avg=31.275041580200195
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7281936407089233, max_abs=4.5625, mean_rel=0.25498631596565247, max_rel=3312.499755859375, norm_rel=0.02357235550880432, ref_abs_avg=30.984146118164062, test_abs_avg=30.983577728271484
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5682640075683594, max_abs=2.3984375, mean_rel=0.13139314949512482, max_rel=5.87535285949707, norm_rel=0.023849178105592728, ref_abs_avg=23.46734046936035, test_abs_avg=23.481475830078125
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7505407333374023, max_abs=5.0, mean_rel=0.1636291742324829, max_rel=1162.7957763671875, norm_rel=0.024852637201547623, ref_abs_avg=30.314678192138672, test_abs_avg=30.315149307250977
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6956720948219299, max_abs=4.25, mean_rel=0.3159499764442444, max_rel=2312.5, norm_rel=0.023250572383403778, ref_abs_avg=29.97277069091797, test_abs_avg=29.970104217529297
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5591492652893066, max_abs=2.5, mean_rel=0.26590996980667114, max_rel=56.01482391357422, norm_rel=0.0237685926258564, ref_abs_avg=23.42156982421875, test_abs_avg=23.423969268798828
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7151904106140137, max_abs=5.0, mean_rel=0.16511908173561096, max_rel=1533.5302734375, norm_rel=0.02458161488175392, ref_abs_avg=29.198394775390625, test_abs_avg=29.201091766357422
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6651932001113892, max_abs=5.0, mean_rel=0.23768913745880127, max_rel=1937.4998779296875, norm_rel=0.023128457367420197, ref_abs_avg=28.800018310546875, test_abs_avg=28.799341201782227
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5846753120422363, max_abs=2.25, mean_rel=0.09248776733875275, max_rel=5.854560852050781, norm_rel=0.022960713133215904, ref_abs_avg=24.86937713623047, test_abs_avg=24.8797607421875
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.782900333404541, max_abs=7.0, mean_rel=0.1678888499736786, max_rel=1812.8668212890625, norm_rel=0.026018625125288963, ref_abs_avg=30.192096710205078, test_abs_avg=30.194881439208984
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7334544658660889, max_abs=5.5, mean_rel=0.3020210862159729, max_rel=3484.374755859375, norm_rel=0.02437019534409046, ref_abs_avg=30.187246322631836, test_abs_avg=30.192058563232422
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5639843940734863, max_abs=2.375, mean_rel=0.09988737106323242, max_rel=5.835427761077881, norm_rel=0.02296236902475357, ref_abs_avg=24.882644653320312, test_abs_avg=24.899471282958984
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7268123030662537, max_abs=5.0, mean_rel=0.16518144309520721, max_rel=1076.3609619140625, norm_rel=0.02536904439330101, ref_abs_avg=28.714839935302734, test_abs_avg=28.7164306640625
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6777290105819702, max_abs=4.5, mean_rel=0.2592666447162628, max_rel=2062.5, norm_rel=0.023780040442943573, ref_abs_avg=28.522306442260742, test_abs_avg=28.522850036621094
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5196741819381714, max_abs=1.875, mean_rel=0.19767287373542786, max_rel=36.414764404296875, norm_rel=0.023023391142487526, ref_abs_avg=23.046619415283203, test_abs_avg=23.072385787963867
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6781789064407349, max_abs=5.5, mean_rel=0.1576482653617859, max_rel=1209.7906494140625, norm_rel=0.024869060143828392, ref_abs_avg=27.266088485717773, test_abs_avg=27.267845153808594
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6293118000030518, max_abs=4.125, mean_rel=0.23962408304214478, max_rel=2125.0, norm_rel=0.0236571803689003, ref_abs_avg=26.593229293823242, test_abs_avg=26.593189239501953
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5072746276855469, max_abs=2.125, mean_rel=0.0802919790148735, max_rel=7.7846856117248535, norm_rel=0.021539678797125816, ref_abs_avg=23.038429260253906, test_abs_avg=23.030242919921875
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.639074444770813, max_abs=4.5, mean_rel=0.1514105647802353, max_rel=705.9142456054688, norm_rel=0.024567758664488792, ref_abs_avg=26.06376838684082, test_abs_avg=26.06439208984375
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5974512696266174, max_abs=4.5, mean_rel=0.2712145447731018, max_rel=3968.749755859375, norm_rel=0.023177457973361015, ref_abs_avg=25.800739288330078, test_abs_avg=25.801288604736328
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4667062759399414, max_abs=1.75, mean_rel=0.0768270492553711, max_rel=7.095456123352051, norm_rel=0.02256421372294426, ref_abs_avg=20.97031021118164, test_abs_avg=20.968585968017578
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6030903458595276, max_abs=4.515625, mean_rel=0.16609716415405273, max_rel=998.533935546875, norm_rel=0.024258175864815712, ref_abs_avg=24.897809982299805, test_abs_avg=24.897113800048828
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5593738555908203, max_abs=4.0, mean_rel=0.2709830701351166, max_rel=1742.1873779296875, norm_rel=0.022559577599167824, ref_abs_avg=24.801372528076172, test_abs_avg=24.804893493652344
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.46102333068847656, max_abs=1.9921875, mean_rel=0.07887570559978485, max_rel=11.702054977416992, norm_rel=0.021381087601184845, ref_abs_avg=21.69683265686035, test_abs_avg=21.689189910888672
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5742790102958679, max_abs=4.625, mean_rel=0.15400859713554382, max_rel=852.25732421875, norm_rel=0.023788999766111374, ref_abs_avg=24.15422821044922, test_abs_avg=24.15557861328125
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5334155559539795, max_abs=3.6875, mean_rel=0.2342870831489563, max_rel=1437.4998779296875, norm_rel=0.022769983857870102, ref_abs_avg=23.474653244018555, test_abs_avg=23.476459503173828
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.44388413429260254, max_abs=1.875, mean_rel=0.10560952126979828, max_rel=14.671096801757812, norm_rel=0.02352786809206009, ref_abs_avg=19.061321258544922, test_abs_avg=19.055173873901367
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5496603846549988, max_abs=4.75, mean_rel=0.15041007101535797, max_rel=868.8768310546875, norm_rel=0.023705976083874702, ref_abs_avg=23.195430755615234, test_abs_avg=23.195920944213867
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5075253248214722, max_abs=4.625, mean_rel=0.21951737999916077, max_rel=1531.2498779296875, norm_rel=0.022089364007115364, ref_abs_avg=22.9887752532959, test_abs_avg=22.98391342163086
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4176445007324219, max_abs=1.75, mean_rel=0.10639332979917526, max_rel=10.713531494140625, norm_rel=0.022392649203538895, ref_abs_avg=18.16246223449707, test_abs_avg=18.222257614135742
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5208480358123779, max_abs=5.0, mean_rel=0.14961311221122742, max_rel=590.0529174804688, norm_rel=0.023168807849287987, ref_abs_avg=22.501861572265625, test_abs_avg=22.50341796875
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4836375117301941, max_abs=3.90625, mean_rel=0.2091389000415802, max_rel=1453.1248779296875, norm_rel=0.021874042227864265, ref_abs_avg=22.17121124267578, test_abs_avg=22.170005798339844
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.46100425720214844, max_abs=1.875, mean_rel=0.1084110364317894, max_rel=7.227385520935059, norm_rel=0.025407759472727776, ref_abs_avg=18.839786529541016, test_abs_avg=18.861520767211914
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5806550979614258, max_abs=4.5, mean_rel=0.15208259224891663, max_rel=693.1245727539062, norm_rel=0.02446105144917965, ref_abs_avg=23.815685272216797, test_abs_avg=23.814308166503906
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.543059766292572, max_abs=4.25, mean_rel=0.22121551632881165, max_rel=1718.7498779296875, norm_rel=0.022830253466963768, ref_abs_avg=23.88721466064453, test_abs_avg=23.891199111938477
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.42584753036499023, max_abs=1.625, mean_rel=0.12626367807388306, max_rel=15.31309700012207, norm_rel=0.02233978919684887, ref_abs_avg=19.327667236328125, test_abs_avg=19.314044952392578
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5422860383987427, max_abs=5.0, mean_rel=0.14993399381637573, max_rel=807.330078125, norm_rel=0.023851841688156128, ref_abs_avg=22.784852981567383, test_abs_avg=22.784616470336914
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5018888115882874, max_abs=4.125, mean_rel=0.21035777032375336, max_rel=1374.9998779296875, norm_rel=0.022226577624678612, ref_abs_avg=22.60968589782715, test_abs_avg=22.61509132385254
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.39544782042503357, max_abs=1.4140625, mean_rel=0.9567053318023682, max_rel=243.61138916015625, norm_rel=0.022788135334849358, ref_abs_avg=17.64107322692871, test_abs_avg=17.628807067871094
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5139762163162231, max_abs=5.0, mean_rel=0.14620596170425415, max_rel=895.014892578125, norm_rel=0.023250434547662735, ref_abs_avg=22.152057647705078, test_abs_avg=22.150285720825195
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.472886323928833, max_abs=4.125, mean_rel=0.19825631380081177, max_rel=1078.125, norm_rel=0.021955624222755432, ref_abs_avg=21.557828903198242, test_abs_avg=21.551773071289062
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.35885143280029297, max_abs=1.5, mean_rel=0.12838611006736755, max_rel=11.429348945617676, norm_rel=0.020275652408599854, ref_abs_avg=17.42448616027832, test_abs_avg=17.390356063842773
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.470198392868042, max_abs=4.0, mean_rel=0.14723682403564453, max_rel=792.5250244140625, norm_rel=0.02274160645902157, ref_abs_avg=20.756254196166992, test_abs_avg=20.755550384521484
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.43956780433654785, max_abs=4.25, mean_rel=0.20477372407913208, max_rel=1812.4998779296875, norm_rel=0.021017855033278465, ref_abs_avg=20.99995994567871, test_abs_avg=21.003389358520508
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.35568273067474365, max_abs=1.75, mean_rel=0.1670072376728058, max_rel=26.972454071044922, norm_rel=0.019911833107471466, ref_abs_avg=17.790767669677734, test_abs_avg=17.797494888305664
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4500865638256073, max_abs=4.625, mean_rel=0.1406865417957306, max_rel=978.501708984375, norm_rel=0.022236138582229614, ref_abs_avg=20.353191375732422, test_abs_avg=20.352209091186523
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.41130003333091736, max_abs=4.46875, mean_rel=0.18826425075531006, max_rel=1624.9998779296875, norm_rel=0.021010136231780052, ref_abs_avg=19.777435302734375, test_abs_avg=19.782135009765625
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.31466376781463623, max_abs=1.1640625, mean_rel=0.13617706298828125, max_rel=16.94199562072754, norm_rel=0.019417978823184967, ref_abs_avg=16.15457534790039, test_abs_avg=16.19007110595703
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.42731231451034546, max_abs=4.5, mean_rel=0.1303638517856598, max_rel=565.1610107421875, norm_rel=0.021791884675621986, ref_abs_avg=19.763397216796875, test_abs_avg=19.76405906677246
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3883424699306488, max_abs=4.0078125, mean_rel=0.1872791200876236, max_rel=1125.0, norm_rel=0.019981659948825836, ref_abs_avg=19.591726303100586, test_abs_avg=19.594463348388672
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30695438385009766, max_abs=1.5, mean_rel=0.05150889605283737, max_rel=2.742830276489258, norm_rel=0.0186830572783947, ref_abs_avg=16.913660049438477, test_abs_avg=16.90332794189453
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.40361446142196655, max_abs=5.0, mean_rel=0.13400885462760925, max_rel=882.6089477539062, norm_rel=0.021424168720841408, ref_abs_avg=19.077430725097656, test_abs_avg=19.07790756225586
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3613154888153076, max_abs=3.75, mean_rel=0.1869010329246521, max_rel=1250.0, norm_rel=0.01962258107960224, ref_abs_avg=18.60382843017578, test_abs_avg=18.61098861694336
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.30347728729248047, max_abs=1.25, mean_rel=0.07459533214569092, max_rel=5.090272426605225, norm_rel=0.019947391003370285, ref_abs_avg=15.25815486907959, test_abs_avg=15.268104553222656
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.36416923999786377, max_abs=4.5, mean_rel=0.12258842587471008, max_rel=415.02471923828125, norm_rel=0.020655620843172073, ref_abs_avg=17.935993194580078, test_abs_avg=17.934917449951172
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3410471975803375, max_abs=3.75, mean_rel=0.1740201711654663, max_rel=1640.6248779296875, norm_rel=0.019096842035651207, ref_abs_avg=18.14366912841797, test_abs_avg=18.147144317626953
production_forward2 vs paper_forward output: mean_abs=0.001635188004001975, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008254118263721466, max_abs=0.46875, mean_rel=0.07142855226993561, max_rel=84.25071716308594, norm_rel=0.019498102366924286, ref_abs_avg=0.4590655565261841, test_abs_avg=0.45908981561660767
production_forward2 grad[1] vs paper_forward: mean_abs=7.217443466186523, max_abs=64.0, mean_rel=0.18814873695373535, max_rel=283.8377685546875, norm_rel=0.02031254768371582, ref_abs_avg=316.07989501953125, test_abs_avg=316.1027526855469
production_forward2 grad[2] vs paper_forward: mean_abs=1.1672649383544922, max_abs=5.3125, mean_rel=0.0746937021613121, max_rel=3.2680954933166504, norm_rel=0.020889796316623688, ref_abs_avg=57.13299560546875, test_abs_avg=57.19126892089844
production_forward2 grad[3] vs paper_forward: mean_abs=1.5837829113006592, max_abs=10.0, mean_rel=0.16770640015602112, max_rel=2111.2255859375, norm_rel=0.02411481738090515, ref_abs_avg=66.07260131835938, test_abs_avg=66.08270263671875
production_forward2 grad[4] vs paper_forward: mean_abs=1.4548218250274658, max_abs=9.5, mean_rel=0.34008872509002686, max_rel=4125.0, norm_rel=0.0223686583340168, ref_abs_avg=65.448486328125, test_abs_avg=65.45756530761719
production_forward2 grad[5] vs paper_forward: mean_abs=1.118863582611084, max_abs=4.5, mean_rel=0.08746054768562317, max_rel=8.044106483459473, norm_rel=0.02396669238805771, ref_abs_avg=46.52621078491211, test_abs_avg=46.55138397216797
production_forward2 grad[6] vs paper_forward: mean_abs=1.383571982383728, max_abs=10.0, mean_rel=0.16262111067771912, max_rel=1926.5233154296875, norm_rel=0.023887598887085915, ref_abs_avg=58.34029006958008, test_abs_avg=58.345550537109375
production_forward2 grad[7] vs paper_forward: mean_abs=1.2759292125701904, max_abs=8.0, mean_rel=0.31346040964126587, max_rel=3937.499755859375, norm_rel=0.02234189584851265, ref_abs_avg=57.45882034301758, test_abs_avg=57.47107696533203
production_forward2 grad[8] vs paper_forward: mean_abs=0.9550971984863281, max_abs=5.5, mean_rel=0.11796113103628159, max_rel=11.825103759765625, norm_rel=0.022005051374435425, ref_abs_avg=44.943172454833984, test_abs_avg=44.959716796875
production_forward2 grad[9] vs paper_forward: mean_abs=1.2590186595916748, max_abs=9.0, mean_rel=0.16624762117862701, max_rel=2111.919189453125, norm_rel=0.023699238896369934, ref_abs_avg=53.49320983886719, test_abs_avg=53.49610137939453
production_forward2 grad[10] vs paper_forward: mean_abs=1.1612091064453125, max_abs=7.375, mean_rel=0.3279750943183899, max_rel=3515.624755859375, norm_rel=0.022009115666151047, ref_abs_avg=53.098548889160156, test_abs_avg=53.098777770996094
production_forward2 grad[11] vs paper_forward: mean_abs=0.9417877197265625, max_abs=3.296875, mean_rel=0.08208068460226059, max_rel=3.0862491130828857, norm_rel=0.02362288534641266, ref_abs_avg=39.278656005859375, test_abs_avg=39.241546630859375
production_forward2 grad[12] vs paper_forward: mean_abs=1.1699153184890747, max_abs=8.0, mean_rel=0.17578446865081787, max_rel=3663.533447265625, norm_rel=0.02357078716158867, ref_abs_avg=49.94597625732422, test_abs_avg=49.94803237915039
production_forward2 grad[13] vs paper_forward: mean_abs=1.0716944932937622, max_abs=6.375, mean_rel=0.3236766457557678, max_rel=2874.999755859375, norm_rel=0.021999657154083252, ref_abs_avg=49.02234649658203, test_abs_avg=49.02690887451172
production_forward2 grad[14] vs paper_forward: mean_abs=0.848389208316803, max_abs=4.3125, mean_rel=0.11404335498809814, max_rel=7.131842613220215, norm_rel=0.02339944802224636, ref_abs_avg=36.175323486328125, test_abs_avg=36.25230026245117
production_forward2 grad[15] vs paper_forward: mean_abs=1.083972454071045, max_abs=8.0, mean_rel=0.15634551644325256, max_rel=1292.647216796875, norm_rel=0.023381255567073822, ref_abs_avg=46.652191162109375, test_abs_avg=46.65874481201172
production_forward2 grad[16] vs paper_forward: mean_abs=1.000288724899292, max_abs=6.5, mean_rel=0.3208128809928894, max_rel=2624.999755859375, norm_rel=0.021964136511087418, ref_abs_avg=45.753623962402344, test_abs_avg=45.762542724609375
production_forward2 grad[17] vs paper_forward: mean_abs=0.7969808578491211, max_abs=2.9375, mean_rel=0.08091077208518982, max_rel=10.360486030578613, norm_rel=0.021509984508156776, ref_abs_avg=37.04426956176758, test_abs_avg=37.05257034301758
production_forward2 grad[18] vs paper_forward: mean_abs=1.0204840898513794, max_abs=7.0, mean_rel=0.14649748802185059, max_rel=1139.302490234375, norm_rel=0.023316167294979095, ref_abs_avg=44.05025863647461, test_abs_avg=44.05572509765625
production_forward2 grad[19] vs paper_forward: mean_abs=0.9414020776748657, max_abs=5.5, mean_rel=0.3527902364730835, max_rel=2874.999755859375, norm_rel=0.021573977544903755, ref_abs_avg=43.8243408203125, test_abs_avg=43.82910919189453
production_forward2 grad[20] vs paper_forward: mean_abs=0.7397775650024414, max_abs=3.5, mean_rel=0.09782842546701431, max_rel=12.603985786437988, norm_rel=0.02188926190137863, ref_abs_avg=34.004905700683594, test_abs_avg=34.03693389892578
production_forward2 grad[21] vs paper_forward: mean_abs=0.9818328022956848, max_abs=8.0, mean_rel=0.15291079878807068, max_rel=1891.8197021484375, norm_rel=0.023323388770222664, ref_abs_avg=42.34927749633789, test_abs_avg=42.35818862915039
production_forward2 grad[22] vs paper_forward: mean_abs=0.8967399597167969, max_abs=5.5, mean_rel=0.27219539880752563, max_rel=2749.999755859375, norm_rel=0.02159642055630684, ref_abs_avg=41.735679626464844, test_abs_avg=41.74177551269531
production_forward2 grad[23] vs paper_forward: mean_abs=0.7713708877563477, max_abs=3.0, mean_rel=0.1173962652683258, max_rel=12.449349403381348, norm_rel=0.024130530655384064, ref_abs_avg=32.46794509887695, test_abs_avg=32.45622253417969
production_forward2 grad[24] vs paper_forward: mean_abs=0.9284851551055908, max_abs=6.5, mean_rel=0.15146403014659882, max_rel=1787.9537353515625, norm_rel=0.023070499300956726, ref_abs_avg=40.499053955078125, test_abs_avg=40.5016975402832
production_forward2 grad[25] vs paper_forward: mean_abs=0.8457838296890259, max_abs=6.0, mean_rel=0.36818942427635193, max_rel=2999.999755859375, norm_rel=0.021267099305987358, ref_abs_avg=39.9589729309082, test_abs_avg=39.96659469604492
production_forward2 grad[26] vs paper_forward: mean_abs=0.8308687210083008, max_abs=3.0, mean_rel=0.09598810970783234, max_rel=16.390926361083984, norm_rel=0.023328300565481186, ref_abs_avg=37.52936553955078, test_abs_avg=37.58188247680664
production_forward2 grad[27] vs paper_forward: mean_abs=1.0677039623260498, max_abs=8.0, mean_rel=0.16218158602714539, max_rel=899.214599609375, norm_rel=0.02508406713604927, ref_abs_avg=42.823944091796875, test_abs_avg=42.83042907714844
production_forward2 grad[28] vs paper_forward: mean_abs=0.9947465658187866, max_abs=6.5, mean_rel=0.3523964285850525, max_rel=3343.749755859375, norm_rel=0.023485859856009483, ref_abs_avg=42.562828063964844, test_abs_avg=42.56150817871094
production_forward2 grad[29] vs paper_forward: mean_abs=0.7505815029144287, max_abs=2.75, mean_rel=0.18028345704078674, max_rel=32.270790100097656, norm_rel=0.02416209504008293, ref_abs_avg=31.24262237548828, test_abs_avg=31.29916000366211
production_forward2 grad[30] vs paper_forward: mean_abs=0.9800207018852234, max_abs=7.0, mean_rel=0.17342311143875122, max_rel=2083.692138671875, norm_rel=0.025317730382084846, ref_abs_avg=38.870052337646484, test_abs_avg=38.871612548828125
production_forward2 grad[31] vs paper_forward: mean_abs=0.9153714179992676, max_abs=5.5, mean_rel=0.37045347690582275, max_rel=3796.874755859375, norm_rel=0.02384340763092041, ref_abs_avg=38.52650451660156, test_abs_avg=38.52333450317383
production_forward2 grad[32] vs paper_forward: mean_abs=0.7366743087768555, max_abs=2.75, mean_rel=0.08936694264411926, max_rel=5.63331937789917, norm_rel=0.024048538878560066, ref_abs_avg=30.558549880981445, test_abs_avg=30.585567474365234
production_forward2 grad[33] vs paper_forward: mean_abs=0.908899188041687, max_abs=6.0, mean_rel=0.15717002749443054, max_rel=944.84521484375, norm_rel=0.025224991142749786, ref_abs_avg=36.209434509277344, test_abs_avg=36.21437072753906
production_forward2 grad[34] vs paper_forward: mean_abs=0.8510398864746094, max_abs=5.5625, mean_rel=0.2826564908027649, max_rel=2562.5, norm_rel=0.023872029036283493, ref_abs_avg=35.802024841308594, test_abs_avg=35.81287384033203
production_forward2 grad[35] vs paper_forward: mean_abs=0.6568779945373535, max_abs=2.25, mean_rel=0.07796497642993927, max_rel=3.078361749649048, norm_rel=0.023872971534729004, ref_abs_avg=27.544532775878906, test_abs_avg=27.553791046142578
production_forward2 grad[36] vs paper_forward: mean_abs=0.8554261922836304, max_abs=5.75, mean_rel=0.1702873408794403, max_rel=1340.2972412109375, norm_rel=0.0250126663595438, ref_abs_avg=34.343727111816406, test_abs_avg=34.345611572265625
production_forward2 grad[37] vs paper_forward: mean_abs=0.803168535232544, max_abs=5.25, mean_rel=0.2969411611557007, max_rel=3312.499755859375, norm_rel=0.0235790703445673, ref_abs_avg=34.16360092163086, test_abs_avg=34.17047119140625
production_forward2 grad[38] vs paper_forward: mean_abs=0.6438932418823242, max_abs=2.5, mean_rel=0.09592531621456146, max_rel=6.235928535461426, norm_rel=0.024706486612558365, ref_abs_avg=25.943302154541016, test_abs_avg=25.919635772705078
production_forward2 grad[39] vs paper_forward: mean_abs=0.8075098395347595, max_abs=5.0, mean_rel=0.16064441204071045, max_rel=972.1456909179688, norm_rel=0.024792181327939034, ref_abs_avg=32.72142791748047, test_abs_avg=32.725921630859375
production_forward2 grad[40] vs paper_forward: mean_abs=0.7491952180862427, max_abs=5.0, mean_rel=0.2784050703048706, max_rel=2218.75, norm_rel=0.023236293345689774, ref_abs_avg=32.34596252441406, test_abs_avg=32.34212875366211
production_forward2 grad[41] vs paper_forward: mean_abs=0.5810902118682861, max_abs=3.0, mean_rel=0.1475619524717331, max_rel=25.170862197875977, norm_rel=0.022226553410291672, ref_abs_avg=26.07061767578125, test_abs_avg=26.06245231628418
production_forward2 grad[42] vs paper_forward: mean_abs=0.7681218385696411, max_abs=5.0, mean_rel=0.16225433349609375, max_rel=1315.9168701171875, norm_rel=0.024607397615909576, ref_abs_avg=31.270896911621094, test_abs_avg=31.276172637939453
production_forward2 grad[43] vs paper_forward: mean_abs=0.7156988382339478, max_abs=5.0, mean_rel=0.2552964687347412, max_rel=2937.499755859375, norm_rel=0.023198651149868965, ref_abs_avg=30.984146118164062, test_abs_avg=30.985870361328125
production_forward2 grad[44] vs paper_forward: mean_abs=0.5561738014221191, max_abs=2.4140625, mean_rel=0.1382761001586914, max_rel=6.388233661651611, norm_rel=0.023643743246793747, ref_abs_avg=23.46734046936035, test_abs_avg=23.47060775756836
production_forward2 grad[45] vs paper_forward: mean_abs=0.740708589553833, max_abs=5.0, mean_rel=0.1573866307735443, max_rel=1285.208984375, norm_rel=0.02452964149415493, ref_abs_avg=30.314678192138672, test_abs_avg=30.316490173339844
production_forward2 grad[46] vs paper_forward: mean_abs=0.6869274377822876, max_abs=4.25, mean_rel=0.29695016145706177, max_rel=1671.8748779296875, norm_rel=0.022934500128030777, ref_abs_avg=29.97277069091797, test_abs_avg=29.97228240966797
production_forward2 grad[47] vs paper_forward: mean_abs=0.5563688278198242, max_abs=2.25, mean_rel=0.39095810055732727, max_rel=99.72444152832031, norm_rel=0.02353522554039955, ref_abs_avg=23.42156982421875, test_abs_avg=23.415143966674805
production_forward2 grad[48] vs paper_forward: mean_abs=0.7071256637573242, max_abs=5.25, mean_rel=0.16351991891860962, max_rel=1439.33935546875, norm_rel=0.0243070088326931, ref_abs_avg=29.198394775390625, test_abs_avg=29.201374053955078
production_forward2 grad[49] vs paper_forward: mean_abs=0.6562867164611816, max_abs=4.5, mean_rel=0.23431959748268127, max_rel=2125.0, norm_rel=0.022809701040387154, ref_abs_avg=28.800018310546875, test_abs_avg=28.801715850830078
production_forward2 grad[50] vs paper_forward: mean_abs=0.58502197265625, max_abs=2.25, mean_rel=0.09862317144870758, max_rel=5.717874050140381, norm_rel=0.02340656891465187, ref_abs_avg=24.86937713623047, test_abs_avg=24.883136749267578
production_forward2 grad[51] vs paper_forward: mean_abs=0.7704035043716431, max_abs=6.0, mean_rel=0.16734136641025543, max_rel=1122.3126220703125, norm_rel=0.02561551332473755, ref_abs_avg=30.192096710205078, test_abs_avg=30.193941116333008
production_forward2 grad[52] vs paper_forward: mean_abs=0.7219679355621338, max_abs=5.0, mean_rel=0.2849991023540497, max_rel=2890.624755859375, norm_rel=0.023999620229005814, ref_abs_avg=30.187246322631836, test_abs_avg=30.189804077148438
production_forward2 grad[53] vs paper_forward: mean_abs=0.5614951848983765, max_abs=2.5, mean_rel=0.08430532366037369, max_rel=5.231665134429932, norm_rel=0.022906072437763214, ref_abs_avg=24.882644653320312, test_abs_avg=24.851852416992188
production_forward2 grad[54] vs paper_forward: mean_abs=0.7163781523704529, max_abs=5.0, mean_rel=0.15720027685165405, max_rel=1437.2762451171875, norm_rel=0.025012098252773285, ref_abs_avg=28.714839935302734, test_abs_avg=28.71712303161621
production_forward2 grad[55] vs paper_forward: mean_abs=0.6657389998435974, max_abs=4.125, mean_rel=0.25111880898475647, max_rel=2062.5, norm_rel=0.023351943120360374, ref_abs_avg=28.522306442260742, test_abs_avg=28.527751922607422
production_forward2 grad[56] vs paper_forward: mean_abs=0.5095744132995605, max_abs=2.25, mean_rel=0.19518205523490906, max_rel=42.859580993652344, norm_rel=0.02295200526714325, ref_abs_avg=23.046619415283203, test_abs_avg=23.07625961303711
production_forward2 grad[57] vs paper_forward: mean_abs=0.6686404347419739, max_abs=5.0, mean_rel=0.16154447197914124, max_rel=1118.7974853515625, norm_rel=0.024532826617360115, ref_abs_avg=27.266088485717773, test_abs_avg=27.268333435058594
production_forward2 grad[58] vs paper_forward: mean_abs=0.6194443702697754, max_abs=4.25, mean_rel=0.24027681350708008, max_rel=1937.4998779296875, norm_rel=0.023280205205082893, ref_abs_avg=26.593229293823242, test_abs_avg=26.595840454101562
production_forward2 grad[59] vs paper_forward: mean_abs=0.5124139785766602, max_abs=1.75, mean_rel=0.0786282867193222, max_rel=5.824002742767334, norm_rel=0.02208160050213337, ref_abs_avg=23.038429260253906, test_abs_avg=23.022817611694336
production_forward2 grad[60] vs paper_forward: mean_abs=0.6316596865653992, max_abs=5.0, mean_rel=0.14799939095973969, max_rel=550.9259033203125, norm_rel=0.024288209155201912, ref_abs_avg=26.06376838684082, test_abs_avg=26.064678192138672
production_forward2 grad[61] vs paper_forward: mean_abs=0.5897327661514282, max_abs=4.25, mean_rel=0.25648751854896545, max_rel=3656.249755859375, norm_rel=0.022908655926585197, ref_abs_avg=25.800739288330078, test_abs_avg=25.802719116210938
production_forward2 grad[62] vs paper_forward: mean_abs=0.43053340911865234, max_abs=1.75, mean_rel=0.06323116272687912, max_rel=4.847390651702881, norm_rel=0.021135419607162476, ref_abs_avg=20.97031021118164, test_abs_avg=20.973730087280273
production_forward2 grad[63] vs paper_forward: mean_abs=0.5960235595703125, max_abs=5.0, mean_rel=0.1608661413192749, max_rel=872.1532592773438, norm_rel=0.023990042507648468, ref_abs_avg=24.897809982299805, test_abs_avg=24.896869659423828
production_forward2 grad[64] vs paper_forward: mean_abs=0.5533135533332825, max_abs=4.5, mean_rel=0.2618020474910736, max_rel=1937.4998779296875, norm_rel=0.022328583523631096, ref_abs_avg=24.801372528076172, test_abs_avg=24.806564331054688
production_forward2 grad[65] vs paper_forward: mean_abs=0.4332294464111328, max_abs=2.0, mean_rel=0.07927653193473816, max_rel=12.46036148071289, norm_rel=0.02075193263590336, ref_abs_avg=21.69683265686035, test_abs_avg=21.70431900024414
production_forward2 grad[66] vs paper_forward: mean_abs=0.569179356098175, max_abs=4.0, mean_rel=0.15130087733268738, max_rel=1288.48291015625, norm_rel=0.023594452068209648, ref_abs_avg=24.15422821044922, test_abs_avg=24.15627670288086
production_forward2 grad[67] vs paper_forward: mean_abs=0.5286988615989685, max_abs=3.625, mean_rel=0.23108522593975067, max_rel=1843.7498779296875, norm_rel=0.022611185908317566, ref_abs_avg=23.474653244018555, test_abs_avg=23.475780487060547
production_forward2 grad[68] vs paper_forward: mean_abs=0.44349074363708496, max_abs=1.8125, mean_rel=0.095668263733387, max_rel=9.489749908447266, norm_rel=0.023359281942248344, ref_abs_avg=19.061321258544922, test_abs_avg=19.058185577392578
production_forward2 grad[69] vs paper_forward: mean_abs=0.5457509756088257, max_abs=4.5, mean_rel=0.15251626074314117, max_rel=719.14306640625, norm_rel=0.02353360503911972, ref_abs_avg=23.195430755615234, test_abs_avg=23.196674346923828
production_forward2 grad[70] vs paper_forward: mean_abs=0.5063903331756592, max_abs=4.625, mean_rel=0.22187843918800354, max_rel=1374.9998779296875, norm_rel=0.02202502451837063, ref_abs_avg=22.9887752532959, test_abs_avg=22.988065719604492
production_forward2 grad[71] vs paper_forward: mean_abs=0.41698265075683594, max_abs=1.625, mean_rel=0.13761641085147858, max_rel=12.611499786376953, norm_rel=0.022660791873931885, ref_abs_avg=18.16246223449707, test_abs_avg=18.219593048095703
production_forward2 grad[72] vs paper_forward: mean_abs=0.5177488327026367, max_abs=6.0, mean_rel=0.15176326036453247, max_rel=887.2609252929688, norm_rel=0.023031514137983322, ref_abs_avg=22.501861572265625, test_abs_avg=22.50212287902832
production_forward2 grad[73] vs paper_forward: mean_abs=0.47873640060424805, max_abs=3.25, mean_rel=0.20643994212150574, max_rel=1187.5, norm_rel=0.021650681272149086, ref_abs_avg=22.17121124267578, test_abs_avg=22.171220779418945
production_forward2 grad[74] vs paper_forward: mean_abs=0.46165725588798523, max_abs=1.75, mean_rel=0.11740326881408691, max_rel=13.495622634887695, norm_rel=0.024846825748682022, ref_abs_avg=18.839786529541016, test_abs_avg=18.849136352539062
production_forward2 grad[75] vs paper_forward: mean_abs=0.5751758217811584, max_abs=5.5, mean_rel=0.1545742303133011, max_rel=1086.38671875, norm_rel=0.024230562150478363, ref_abs_avg=23.815685272216797, test_abs_avg=23.814083099365234
production_forward2 grad[76] vs paper_forward: mean_abs=0.5365727543830872, max_abs=4.0, mean_rel=0.22749334573745728, max_rel=1843.7498779296875, norm_rel=0.022541118785738945, ref_abs_avg=23.88721466064453, test_abs_avg=23.893962860107422
production_forward2 grad[77] vs paper_forward: mean_abs=0.4328773021697998, max_abs=1.5, mean_rel=0.1528884768486023, max_rel=20.161706924438477, norm_rel=0.022647250443696976, ref_abs_avg=19.327667236328125, test_abs_avg=19.316463470458984
production_forward2 grad[78] vs paper_forward: mean_abs=0.5374292135238647, max_abs=4.5, mean_rel=0.14564166963100433, max_rel=587.7843017578125, norm_rel=0.023675406351685524, ref_abs_avg=22.784852981567383, test_abs_avg=22.784799575805664
production_forward2 grad[79] vs paper_forward: mean_abs=0.49705395102500916, max_abs=4.0, mean_rel=0.19476643204689026, max_rel=1281.25, norm_rel=0.022006437182426453, ref_abs_avg=22.60968589782715, test_abs_avg=22.614742279052734
production_forward2 grad[80] vs paper_forward: mean_abs=0.39306962490081787, max_abs=1.2890625, mean_rel=0.6370588541030884, max_rel=245.03663635253906, norm_rel=0.022396383807063103, ref_abs_avg=17.64107322692871, test_abs_avg=17.620223999023438
production_forward2 grad[81] vs paper_forward: mean_abs=0.5102580785751343, max_abs=5.0, mean_rel=0.1433180868625641, max_rel=734.8797607421875, norm_rel=0.023080525919795036, ref_abs_avg=22.152057647705078, test_abs_avg=22.15187644958496
production_forward2 grad[82] vs paper_forward: mean_abs=0.46662282943725586, max_abs=4.25, mean_rel=0.20857685804367065, max_rel=1312.4998779296875, norm_rel=0.02166549675166607, ref_abs_avg=21.557828903198242, test_abs_avg=21.553199768066406
production_forward2 grad[83] vs paper_forward: mean_abs=0.3604913651943207, max_abs=1.25, mean_rel=0.13703829050064087, max_rel=11.354249000549316, norm_rel=0.020438354462385178, ref_abs_avg=17.42448616027832, test_abs_avg=17.41387176513672
production_forward2 grad[84] vs paper_forward: mean_abs=0.46754103899002075, max_abs=4.5, mean_rel=0.14707350730895996, max_rel=1070.1070556640625, norm_rel=0.022625699639320374, ref_abs_avg=20.756254196166992, test_abs_avg=20.756244659423828
production_forward2 grad[85] vs paper_forward: mean_abs=0.4394727349281311, max_abs=4.53125, mean_rel=0.20666146278381348, max_rel=1874.9998779296875, norm_rel=0.021003790199756622, ref_abs_avg=20.99995994567871, test_abs_avg=21.007102966308594
production_forward2 grad[86] vs paper_forward: mean_abs=0.3394787311553955, max_abs=1.34375, mean_rel=0.10791299492120743, max_rel=9.086942672729492, norm_rel=0.019247813150286674, ref_abs_avg=17.790767669677734, test_abs_avg=17.777114868164062
production_forward2 grad[87] vs paper_forward: mean_abs=0.4470929503440857, max_abs=4.375, mean_rel=0.13723893463611603, max_rel=701.9793701171875, norm_rel=0.022110430523753166, ref_abs_avg=20.353191375732422, test_abs_avg=20.352643966674805
production_forward2 grad[88] vs paper_forward: mean_abs=0.40752917528152466, max_abs=4.1875, mean_rel=0.18865495920181274, max_rel=1312.4998779296875, norm_rel=0.020790111273527145, ref_abs_avg=19.777435302734375, test_abs_avg=19.784366607666016
production_forward2 grad[89] vs paper_forward: mean_abs=0.326448917388916, max_abs=1.25, mean_rel=0.14266221225261688, max_rel=7.919996738433838, norm_rel=0.01991857774555683, ref_abs_avg=16.15457534790039, test_abs_avg=16.184955596923828
production_forward2 grad[90] vs paper_forward: mean_abs=0.4254674017429352, max_abs=6.0, mean_rel=0.1286354511976242, max_rel=702.1995849609375, norm_rel=0.021715456619858742, ref_abs_avg=19.763397216796875, test_abs_avg=19.76457405090332
production_forward2 grad[91] vs paper_forward: mean_abs=0.3896591067314148, max_abs=4.40625, mean_rel=0.1802660971879959, max_rel=1812.4998779296875, norm_rel=0.020044611766934395, ref_abs_avg=19.591726303100586, test_abs_avg=19.592323303222656
production_forward2 grad[92] vs paper_forward: mean_abs=0.3078598976135254, max_abs=1.5, mean_rel=0.05147252231836319, max_rel=2.433365821838379, norm_rel=0.01833803951740265, ref_abs_avg=16.913660049438477, test_abs_avg=16.909252166748047
production_forward2 grad[93] vs paper_forward: mean_abs=0.4032936096191406, max_abs=5.0, mean_rel=0.13354916870594025, max_rel=746.3300170898438, norm_rel=0.021394157782197, ref_abs_avg=19.077430725097656, test_abs_avg=19.077842712402344
production_forward2 grad[94] vs paper_forward: mean_abs=0.3577747344970703, max_abs=4.0, mean_rel=0.19079332053661346, max_rel=1390.6248779296875, norm_rel=0.019434720277786255, ref_abs_avg=18.60382843017578, test_abs_avg=18.60901641845703
production_forward2 grad[95] vs paper_forward: mean_abs=0.30394887924194336, max_abs=1.5, mean_rel=0.06942612677812576, max_rel=2.9782164096832275, norm_rel=0.01984385773539543, ref_abs_avg=15.25815486907959, test_abs_avg=15.275117874145508
production_forward2 grad[96] vs paper_forward: mean_abs=0.3640972971916199, max_abs=4.0, mean_rel=0.1216895580291748, max_rel=464.0430603027344, norm_rel=0.02065253257751465, ref_abs_avg=17.935993194580078, test_abs_avg=17.935558319091797
production_forward2 grad[97] vs paper_forward: mean_abs=0.34079134464263916, max_abs=3.5, mean_rel=0.1658056378364563, max_rel=1171.875, norm_rel=0.01904706284403801, ref_abs_avg=18.14366912841797, test_abs_avg=18.1508731842041
identity layers + randn queries
production_forward fwd+bwd:  113.536 ms
production_forward bwd-only: 95.956 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.322 GiB, fwd+bwd=10.322 GiB
torch_compile_phases_forward fwd+bwd:  168.745 ms
torch_compile_phases_forward bwd-only: 132.769 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB
paper_forward fwd+bwd:  382.266 ms
paper_forward bwd-only: 302.275 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.742 GiB, fwd+bwd=32.492 GiB
production_forward2 fwd+bwd:  114.490 ms
production_forward2 bwd-only: 96.080 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.322 GiB, fwd+bwd=10.322 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016344812465831637, max_abs=0.0546875
production_forward grad[0] vs paper_forward: mean_abs=0.008324876427650452, max_abs=0.4921875, mean_rel=0.07264275848865509, max_rel=283.4215393066406, norm_rel=0.019882801920175552, ref_abs_avg=0.45441845059394836, test_abs_avg=0.45443838834762573
production_forward grad[1] vs paper_forward: mean_abs=7.20973539352417, max_abs=56.0, mean_rel=0.12589238584041595, max_rel=109.22783660888672, norm_rel=0.02015644498169422, ref_abs_avg=314.8078308105469, test_abs_avg=314.8314208984375
production_forward grad[2] vs paper_forward: mean_abs=1.2054100036621094, max_abs=5.0, mean_rel=0.16519300639629364, max_rel=26.802377700805664, norm_rel=0.022662343457341194, ref_abs_avg=54.17042541503906, test_abs_avg=54.13069152832031
production_forward grad[3] vs paper_forward: mean_abs=1.572697639465332, max_abs=11.5, mean_rel=0.16635273396968842, max_rel=1708.16064453125, norm_rel=0.0245023462921381, ref_abs_avg=64.59895324707031, test_abs_avg=64.60001373291016
production_forward grad[4] vs paper_forward: mean_abs=1.4564557075500488, max_abs=9.0, mean_rel=0.4540036916732788, max_rel=5687.49951171875, norm_rel=0.023081105202436447, ref_abs_avg=63.53062438964844, test_abs_avg=63.530738830566406
production_forward grad[5] vs paper_forward: mean_abs=1.1565048694610596, max_abs=3.75, mean_rel=0.3264463245868683, max_rel=118.48311614990234, norm_rel=0.0243802759796381, ref_abs_avg=45.64159393310547, test_abs_avg=45.512184143066406
production_forward grad[6] vs paper_forward: mean_abs=1.382358431816101, max_abs=10.0, mean_rel=0.16395172476768494, max_rel=2476.4873046875, norm_rel=0.024326996877789497, ref_abs_avg=57.282318115234375, test_abs_avg=57.28483200073242
production_forward grad[7] vs paper_forward: mean_abs=1.26904296875, max_abs=8.0, mean_rel=0.41066139936447144, max_rel=4625.0, norm_rel=0.02258337289094925, ref_abs_avg=56.54450225830078, test_abs_avg=56.55049133300781
production_forward grad[8] vs paper_forward: mean_abs=1.019148349761963, max_abs=4.375, mean_rel=0.1131591945886612, max_rel=12.49057674407959, norm_rel=0.023569481447339058, ref_abs_avg=43.430355072021484, test_abs_avg=43.40074157714844
production_forward grad[9] vs paper_forward: mean_abs=1.2644119262695312, max_abs=9.0, mean_rel=0.15877923369407654, max_rel=899.4017333984375, norm_rel=0.024045972153544426, ref_abs_avg=52.95016098022461, test_abs_avg=52.94877624511719
production_forward grad[10] vs paper_forward: mean_abs=1.1589603424072266, max_abs=6.6875, mean_rel=0.4415279030799866, max_rel=4312.5, norm_rel=0.0222710482776165, ref_abs_avg=52.29153823852539, test_abs_avg=52.28376007080078
production_forward grad[11] vs paper_forward: mean_abs=0.9520168304443359, max_abs=3.28125, mean_rel=0.09387747943401337, max_rel=4.666011810302734, norm_rel=0.022734716534614563, ref_abs_avg=41.679283142089844, test_abs_avg=41.70176696777344
production_forward grad[12] vs paper_forward: mean_abs=1.1724166870117188, max_abs=10.0, mean_rel=0.17474539577960968, max_rel=2061.329345703125, norm_rel=0.023879561573266983, ref_abs_avg=49.442909240722656, test_abs_avg=49.442657470703125
production_forward grad[13] vs paper_forward: mean_abs=1.0802733898162842, max_abs=6.625, mean_rel=0.32799994945526123, max_rel=3406.249755859375, norm_rel=0.022349948063492775, ref_abs_avg=48.61642837524414, test_abs_avg=48.620941162109375
production_forward grad[14] vs paper_forward: mean_abs=0.8798007965087891, max_abs=3.375, mean_rel=0.11107034981250763, max_rel=21.966724395751953, norm_rel=0.021704459562897682, ref_abs_avg=40.14910888671875, test_abs_avg=40.170005798339844
production_forward grad[15] vs paper_forward: mean_abs=1.0985356569290161, max_abs=8.0, mean_rel=0.1639106720685959, max_rel=1756.24853515625, norm_rel=0.023814469575881958, ref_abs_avg=46.412269592285156, test_abs_avg=46.412864685058594
production_forward grad[16] vs paper_forward: mean_abs=1.0129797458648682, max_abs=6.75, mean_rel=0.31035280227661133, max_rel=3374.999755859375, norm_rel=0.02226846292614937, ref_abs_avg=45.70769500732422, test_abs_avg=45.70228576660156
production_forward grad[17] vs paper_forward: mean_abs=0.8211588859558105, max_abs=3.328125, mean_rel=0.08413796126842499, max_rel=4.783606052398682, norm_rel=0.02208445779979229, ref_abs_avg=37.31761932373047, test_abs_avg=37.387184143066406
production_forward grad[18] vs paper_forward: mean_abs=1.0314135551452637, max_abs=7.0, mean_rel=0.17480459809303284, max_rel=1756.1866455078125, norm_rel=0.02369278483092785, ref_abs_avg=43.81536865234375, test_abs_avg=43.81806564331055
production_forward grad[19] vs paper_forward: mean_abs=0.9454120397567749, max_abs=6.5, mean_rel=0.32190483808517456, max_rel=3281.249755859375, norm_rel=0.021848997101187706, ref_abs_avg=43.551422119140625, test_abs_avg=43.55466079711914
production_forward grad[20] vs paper_forward: mean_abs=0.7588176727294922, max_abs=3.5, mean_rel=0.11673389375209808, max_rel=15.264082908630371, norm_rel=0.022033562883734703, ref_abs_avg=35.294403076171875, test_abs_avg=35.36418533325195
production_forward grad[21] vs paper_forward: mean_abs=0.9814146757125854, max_abs=6.0, mean_rel=0.1588641107082367, max_rel=2040.828125, norm_rel=0.02357647940516472, ref_abs_avg=41.891334533691406, test_abs_avg=41.89385223388672
production_forward grad[22] vs paper_forward: mean_abs=0.8964749574661255, max_abs=5.0, mean_rel=0.26485905051231384, max_rel=2375.0, norm_rel=0.021682431921362877, ref_abs_avg=41.529502868652344, test_abs_avg=41.52964401245117
production_forward grad[23] vs paper_forward: mean_abs=0.703031063079834, max_abs=2.5390625, mean_rel=0.1194927766919136, max_rel=14.6292142868042, norm_rel=0.021398894488811493, ref_abs_avg=33.006980895996094, test_abs_avg=32.95063400268555
production_forward grad[24] vs paper_forward: mean_abs=0.9345271587371826, max_abs=7.0, mean_rel=0.1556698977947235, max_rel=2506.144287109375, norm_rel=0.02338619902729988, ref_abs_avg=40.20010757446289, test_abs_avg=40.203887939453125
production_forward grad[25] vs paper_forward: mean_abs=0.8575114011764526, max_abs=5.375, mean_rel=0.3245236873626709, max_rel=3187.499755859375, norm_rel=0.02173038385808468, ref_abs_avg=39.634490966796875, test_abs_avg=39.641727447509766
production_forward grad[26] vs paper_forward: mean_abs=0.8625357151031494, max_abs=3.5, mean_rel=0.15501615405082703, max_rel=18.392236709594727, norm_rel=0.024823257699608803, ref_abs_avg=34.43836975097656, test_abs_avg=34.44978713989258
production_forward grad[27] vs paper_forward: mean_abs=1.075392246246338, max_abs=7.0, mean_rel=0.1760292500257492, max_rel=2293.4619140625, norm_rel=0.025202011689543724, ref_abs_avg=42.913673400878906, test_abs_avg=42.91542434692383
production_forward grad[28] vs paper_forward: mean_abs=0.9958857297897339, max_abs=5.75, mean_rel=0.3264125883579254, max_rel=2812.499755859375, norm_rel=0.023668572306632996, ref_abs_avg=42.272499084472656, test_abs_avg=42.27351379394531
production_forward grad[29] vs paper_forward: mean_abs=0.7913985252380371, max_abs=3.5, mean_rel=0.13226750493049622, max_rel=13.093015670776367, norm_rel=0.023747609928250313, ref_abs_avg=32.768577575683594, test_abs_avg=32.79868698120117
production_forward grad[30] vs paper_forward: mean_abs=1.002370834350586, max_abs=6.5, mean_rel=0.1618727594614029, max_rel=1903.4202880859375, norm_rel=0.025619998574256897, ref_abs_avg=39.30220031738281, test_abs_avg=39.304649353027344
production_forward grad[31] vs paper_forward: mean_abs=0.9302586317062378, max_abs=5.5, mean_rel=0.3399850130081177, max_rel=3406.249755859375, norm_rel=0.024146493524312973, ref_abs_avg=38.66518783569336, test_abs_avg=38.67036819458008
production_forward grad[32] vs paper_forward: mean_abs=0.7158082723617554, max_abs=3.09375, mean_rel=0.30337610840797424, max_rel=109.43256378173828, norm_rel=0.024462031200528145, ref_abs_avg=29.249446868896484, test_abs_avg=29.28200340270996
production_forward grad[33] vs paper_forward: mean_abs=0.9207857251167297, max_abs=7.0, mean_rel=0.17172756791114807, max_rel=1243.12060546875, norm_rel=0.0254783034324646, ref_abs_avg=36.30282211303711, test_abs_avg=36.304569244384766
production_forward grad[34] vs paper_forward: mean_abs=0.8645457625389099, max_abs=5.5625, mean_rel=0.29668128490448, max_rel=2749.999755859375, norm_rel=0.023919319733977318, ref_abs_avg=36.244895935058594, test_abs_avg=36.245506286621094
production_forward grad[35] vs paper_forward: mean_abs=0.6962201595306396, max_abs=3.3125, mean_rel=0.13154925405979156, max_rel=17.695405960083008, norm_rel=0.02448827400803566, ref_abs_avg=28.26502799987793, test_abs_avg=28.337108612060547
production_forward grad[36] vs paper_forward: mean_abs=0.8721567392349243, max_abs=6.0, mean_rel=0.17605531215667725, max_rel=1886.6405029296875, norm_rel=0.025220785290002823, ref_abs_avg=34.75468826293945, test_abs_avg=34.75828552246094
production_forward grad[37] vs paper_forward: mean_abs=0.8132742047309875, max_abs=6.25, mean_rel=0.33419597148895264, max_rel=2562.5, norm_rel=0.02393346093595028, ref_abs_avg=34.067928314208984, test_abs_avg=34.075496673583984
production_forward grad[38] vs paper_forward: mean_abs=0.6339473724365234, max_abs=3.125, mean_rel=0.0910908430814743, max_rel=5.149725437164307, norm_rel=0.022774316370487213, ref_abs_avg=28.468690872192383, test_abs_avg=28.457901000976562
production_forward grad[39] vs paper_forward: mean_abs=0.8259449005126953, max_abs=6.5, mean_rel=0.15975156426429749, max_rel=1376.2867431640625, norm_rel=0.025098854675889015, ref_abs_avg=33.04030227661133, test_abs_avg=33.041046142578125
production_forward grad[40] vs paper_forward: mean_abs=0.7681624889373779, max_abs=5.875, mean_rel=0.2530667185783386, max_rel=3531.249755859375, norm_rel=0.023578159511089325, ref_abs_avg=32.67892837524414, test_abs_avg=32.6832389831543
production_forward grad[41] vs paper_forward: mean_abs=0.6363635063171387, max_abs=2.6875, mean_rel=0.13422314822673798, max_rel=24.827367782592773, norm_rel=0.02332434244453907, ref_abs_avg=27.595428466796875, test_abs_avg=27.55910873413086
production_forward grad[42] vs paper_forward: mean_abs=0.78684401512146, max_abs=6.0, mean_rel=0.17024719715118408, max_rel=1536.8748779296875, norm_rel=0.024691978469491005, ref_abs_avg=31.95787811279297, test_abs_avg=31.960540771484375
production_forward grad[43] vs paper_forward: mean_abs=0.7301410436630249, max_abs=5.0, mean_rel=0.23830009996891022, max_rel=2187.5, norm_rel=0.023437952622771263, ref_abs_avg=31.2647762298584, test_abs_avg=31.2667236328125
production_forward grad[44] vs paper_forward: mean_abs=0.6108589172363281, max_abs=2.25, mean_rel=0.0734115019440651, max_rel=3.2559831142425537, norm_rel=0.023268800228834152, ref_abs_avg=26.227962493896484, test_abs_avg=26.17993927001953
production_forward grad[45] vs paper_forward: mean_abs=0.744001030921936, max_abs=5.5, mean_rel=0.16227677464485168, max_rel=1437.3638916015625, norm_rel=0.024641405791044235, ref_abs_avg=30.270509719848633, test_abs_avg=30.27100372314453
production_forward grad[46] vs paper_forward: mean_abs=0.6921640038490295, max_abs=4.25, mean_rel=0.27829062938690186, max_rel=2093.75, norm_rel=0.023280732333660126, ref_abs_avg=29.794906616210938, test_abs_avg=29.802440643310547
production_forward grad[47] vs paper_forward: mean_abs=0.540158748626709, max_abs=2.25, mean_rel=0.1023159921169281, max_rel=4.426397323608398, norm_rel=0.022706078365445137, ref_abs_avg=24.01276969909668, test_abs_avg=24.02025604248047
production_forward grad[48] vs paper_forward: mean_abs=0.716802716255188, max_abs=5.0, mean_rel=0.1498228907585144, max_rel=502.23095703125, norm_rel=0.024346409365534782, ref_abs_avg=29.52189826965332, test_abs_avg=29.523279190063477
production_forward grad[49] vs paper_forward: mean_abs=0.6693341732025146, max_abs=5.0, mean_rel=0.3275161385536194, max_rel=2562.5, norm_rel=0.022858085110783577, ref_abs_avg=29.35482406616211, test_abs_avg=29.356910705566406
production_forward grad[50] vs paper_forward: mean_abs=0.5915709733963013, max_abs=2.5625, mean_rel=0.1588262915611267, max_rel=23.491161346435547, norm_rel=0.02314768172800541, ref_abs_avg=25.638328552246094, test_abs_avg=25.646018981933594
production_forward grad[51] vs paper_forward: mean_abs=0.809301495552063, max_abs=6.0, mean_rel=0.1691647171974182, max_rel=1418.4774169921875, norm_rel=0.026158466935157776, ref_abs_avg=31.053377151489258, test_abs_avg=31.05554962158203
production_forward grad[52] vs paper_forward: mean_abs=0.7472457885742188, max_abs=5.0, mean_rel=0.30112558603286743, max_rel=2375.0, norm_rel=0.024439403787255287, ref_abs_avg=30.655466079711914, test_abs_avg=30.657625198364258
production_forward grad[53] vs paper_forward: mean_abs=0.5897393226623535, max_abs=2.5, mean_rel=0.3062290847301483, max_rel=93.3399887084961, norm_rel=0.024296654388308525, ref_abs_avg=24.42874526977539, test_abs_avg=24.451419830322266
production_forward grad[54] vs paper_forward: mean_abs=0.7325382828712463, max_abs=5.5, mean_rel=0.17428700625896454, max_rel=1026.299072265625, norm_rel=0.025451937690377235, ref_abs_avg=28.868560791015625, test_abs_avg=28.869915008544922
production_forward grad[55] vs paper_forward: mean_abs=0.6831139326095581, max_abs=4.375, mean_rel=0.2576141357421875, max_rel=2203.125, norm_rel=0.024191802367568016, ref_abs_avg=28.32145881652832, test_abs_avg=28.3164119720459
production_forward grad[56] vs paper_forward: mean_abs=0.5148220062255859, max_abs=2.25, mean_rel=0.15096960961818695, max_rel=25.214630126953125, norm_rel=0.02372078225016594, ref_abs_avg=22.281295776367188, test_abs_avg=22.31824493408203
production_forward grad[57] vs paper_forward: mean_abs=0.6784414052963257, max_abs=5.09375, mean_rel=0.16464078426361084, max_rel=1554.12744140625, norm_rel=0.0249446090310812, ref_abs_avg=27.271678924560547, test_abs_avg=27.273178100585938
production_forward grad[58] vs paper_forward: mean_abs=0.6308825612068176, max_abs=4.25, mean_rel=0.259933739900589, max_rel=1843.7498779296875, norm_rel=0.023284297436475754, ref_abs_avg=27.106552124023438, test_abs_avg=27.10440444946289
production_forward grad[59] vs paper_forward: mean_abs=0.4772624969482422, max_abs=2.234375, mean_rel=0.08047029376029968, max_rel=3.0587661266326904, norm_rel=0.02108473889529705, ref_abs_avg=22.71902084350586, test_abs_avg=22.760189056396484
production_forward grad[60] vs paper_forward: mean_abs=0.6453008651733398, max_abs=5.5, mean_rel=0.15544962882995605, max_rel=1366.309326171875, norm_rel=0.02462136372923851, ref_abs_avg=26.239456176757812, test_abs_avg=26.24004554748535
production_forward grad[61] vs paper_forward: mean_abs=0.5954256057739258, max_abs=3.875, mean_rel=0.25092780590057373, max_rel=1531.2498779296875, norm_rel=0.02310998924076557, ref_abs_avg=25.771766662597656, test_abs_avg=25.767818450927734
production_forward grad[62] vs paper_forward: mean_abs=0.46674370765686035, max_abs=1.875, mean_rel=0.10095608234405518, max_rel=15.992758750915527, norm_rel=0.022342275828123093, ref_abs_avg=20.66823387145996, test_abs_avg=20.65064811706543
production_forward grad[63] vs paper_forward: mean_abs=0.606048583984375, max_abs=4.75, mean_rel=0.15518198907375336, max_rel=1314.84912109375, norm_rel=0.024298712611198425, ref_abs_avg=24.959190368652344, test_abs_avg=24.961204528808594
production_forward grad[64] vs paper_forward: mean_abs=0.5619937181472778, max_abs=4.0, mean_rel=0.24362260103225708, max_rel=1999.9998779296875, norm_rel=0.022542107850313187, ref_abs_avg=25.00151824951172, test_abs_avg=25.007972717285156
production_forward grad[65] vs paper_forward: mean_abs=0.43162429332733154, max_abs=1.625, mean_rel=0.14003680646419525, max_rel=29.524276733398438, norm_rel=0.02226918563246727, ref_abs_avg=19.801610946655273, test_abs_avg=19.79336166381836
production_forward grad[66] vs paper_forward: mean_abs=0.5744530558586121, max_abs=5.0, mean_rel=0.14972372353076935, max_rel=1051.5938720703125, norm_rel=0.02381473407149315, ref_abs_avg=24.149688720703125, test_abs_avg=24.150592803955078
production_forward grad[67] vs paper_forward: mean_abs=0.5317139625549316, max_abs=3.75, mean_rel=0.22657278180122375, max_rel=2125.0, norm_rel=0.022055257111787796, ref_abs_avg=24.06630516052246, test_abs_avg=24.074918746948242
production_forward grad[68] vs paper_forward: mean_abs=0.4469224214553833, max_abs=1.9375, mean_rel=0.13553911447525024, max_rel=12.580076217651367, norm_rel=0.02279265597462654, ref_abs_avg=19.509904861450195, test_abs_avg=19.507966995239258
production_forward grad[69] vs paper_forward: mean_abs=0.5455092191696167, max_abs=4.5, mean_rel=0.14785116910934448, max_rel=917.2681274414062, norm_rel=0.02358219027519226, ref_abs_avg=23.191558837890625, test_abs_avg=23.191547393798828
production_forward grad[70] vs paper_forward: mean_abs=0.5099676847457886, max_abs=3.5, mean_rel=0.21880516409873962, max_rel=1374.9998779296875, norm_rel=0.021929115056991577, ref_abs_avg=23.200538635253906, test_abs_avg=23.20367431640625
production_forward grad[71] vs paper_forward: mean_abs=0.40503787994384766, max_abs=1.625, mean_rel=0.14204567670822144, max_rel=34.29736328125, norm_rel=0.021386723965406418, ref_abs_avg=18.65290641784668, test_abs_avg=18.635963439941406
production_forward grad[72] vs paper_forward: mean_abs=0.5276790261268616, max_abs=5.0, mean_rel=0.14417968690395355, max_rel=761.7653198242188, norm_rel=0.022901935502886772, ref_abs_avg=23.017032623291016, test_abs_avg=23.017887115478516
production_forward grad[73] vs paper_forward: mean_abs=0.4760313630104065, max_abs=3.5, mean_rel=0.22402159869670868, max_rel=1562.4998779296875, norm_rel=0.021291038021445274, ref_abs_avg=22.395889282226562, test_abs_avg=22.396011352539062
production_forward grad[74] vs paper_forward: mean_abs=0.4329805374145508, max_abs=1.75, mean_rel=0.10480864346027374, max_rel=7.766636371612549, norm_rel=0.022212402895092964, ref_abs_avg=20.05221176147461, test_abs_avg=20.10350799560547
production_forward grad[75] vs paper_forward: mean_abs=0.5772093534469604, max_abs=4.5, mean_rel=0.15772730112075806, max_rel=992.3341674804688, norm_rel=0.024606775492429733, ref_abs_avg=23.52664566040039, test_abs_avg=23.52631378173828
production_forward grad[76] vs paper_forward: mean_abs=0.542076826095581, max_abs=4.0625, mean_rel=0.24647584557533264, max_rel=1749.9998779296875, norm_rel=0.023136893287301064, ref_abs_avg=23.46246337890625, test_abs_avg=23.459726333618164
production_forward grad[77] vs paper_forward: mean_abs=0.4140145182609558, max_abs=1.875, mean_rel=0.14196738600730896, max_rel=22.35970115661621, norm_rel=0.021325185894966125, ref_abs_avg=19.21063804626465, test_abs_avg=19.223167419433594
production_forward grad[78] vs paper_forward: mean_abs=0.5346622467041016, max_abs=6.0, mean_rel=0.14582978188991547, max_rel=1641.7615966796875, norm_rel=0.023964716121554375, ref_abs_avg=22.370532989501953, test_abs_avg=22.371593475341797
production_forward grad[79] vs paper_forward: mean_abs=0.4908364415168762, max_abs=3.5, mean_rel=0.2799050509929657, max_rel=2250.0, norm_rel=0.02249765954911709, ref_abs_avg=21.85501480102539, test_abs_avg=21.849334716796875
production_forward grad[80] vs paper_forward: mean_abs=0.3872408866882324, max_abs=2.0, mean_rel=0.13463851809501648, max_rel=28.36454200744629, norm_rel=0.022891340777277946, ref_abs_avg=16.17538833618164, test_abs_avg=16.167827606201172
production_forward grad[81] vs paper_forward: mean_abs=0.4931587874889374, max_abs=4.5, mean_rel=0.1475352942943573, max_rel=581.4579467773438, norm_rel=0.023353004828095436, ref_abs_avg=21.172136306762695, test_abs_avg=21.17261505126953
production_forward grad[82] vs paper_forward: mean_abs=0.45792752504348755, max_abs=3.5, mean_rel=0.22558221220970154, max_rel=1499.9998779296875, norm_rel=0.02203787863254547, ref_abs_avg=20.885786056518555, test_abs_avg=20.883853912353516
production_forward grad[83] vs paper_forward: mean_abs=0.3398439884185791, max_abs=1.5, mean_rel=0.13917392492294312, max_rel=28.169952392578125, norm_rel=0.0208427757024765, ref_abs_avg=16.810569763183594, test_abs_avg=16.81578826904297
production_forward grad[84] vs paper_forward: mean_abs=0.46156466007232666, max_abs=4.0, mean_rel=0.13771679997444153, max_rel=525.9137573242188, norm_rel=0.022861763834953308, ref_abs_avg=20.239524841308594, test_abs_avg=20.240684509277344
production_forward grad[85] vs paper_forward: mean_abs=0.4260503053665161, max_abs=3.8125, mean_rel=0.19024038314819336, max_rel=968.7499389648438, norm_rel=0.021475432440638542, ref_abs_avg=19.9625244140625, test_abs_avg=19.963077545166016
production_forward grad[86] vs paper_forward: mean_abs=0.3482193946838379, max_abs=1.291015625, mean_rel=0.08990196138620377, max_rel=8.380989074707031, norm_rel=0.02088778093457222, ref_abs_avg=16.544313430786133, test_abs_avg=16.581247329711914
production_forward grad[87] vs paper_forward: mean_abs=0.4352530539035797, max_abs=4.0, mean_rel=0.13976439833641052, max_rel=1001.94873046875, norm_rel=0.022390665486454964, ref_abs_avg=19.559785842895508, test_abs_avg=19.55953025817871
production_forward grad[88] vs paper_forward: mean_abs=0.39864903688430786, max_abs=5.125, mean_rel=0.18309557437896729, max_rel=1250.0, norm_rel=0.020909028127789497, ref_abs_avg=19.25068473815918, test_abs_avg=19.241853713989258
production_forward grad[89] vs paper_forward: mean_abs=0.3259704113006592, max_abs=1.625, mean_rel=0.134538471698761, max_rel=13.29662799835205, norm_rel=0.021096033975481987, ref_abs_avg=15.6720552444458, test_abs_avg=15.652399063110352
production_forward grad[90] vs paper_forward: mean_abs=0.4163878560066223, max_abs=5.0, mean_rel=0.1332177370786667, max_rel=616.0791625976562, norm_rel=0.02200506068766117, ref_abs_avg=19.07494354248047, test_abs_avg=19.074277877807617
production_forward grad[91] vs paper_forward: mean_abs=0.37830519676208496, max_abs=3.1875, mean_rel=0.17975448071956635, max_rel=999.9999389648438, norm_rel=0.020056629553437233, ref_abs_avg=19.044330596923828, test_abs_avg=19.03921127319336
production_forward grad[92] vs paper_forward: mean_abs=0.30398130416870117, max_abs=1.125, mean_rel=0.0765550509095192, max_rel=9.827763557434082, norm_rel=0.019346589222550392, ref_abs_avg=16.245655059814453, test_abs_avg=16.25115966796875
production_forward grad[93] vs paper_forward: mean_abs=0.3917021155357361, max_abs=4.5, mean_rel=0.13114161789417267, max_rel=660.47119140625, norm_rel=0.021393967792391777, ref_abs_avg=18.552532196044922, test_abs_avg=18.554391860961914
production_forward grad[94] vs paper_forward: mean_abs=0.34817221760749817, max_abs=3.25, mean_rel=0.15983422100543976, max_rel=1156.25, norm_rel=0.019169382750988007, ref_abs_avg=18.391395568847656, test_abs_avg=18.396717071533203
production_forward grad[95] vs paper_forward: mean_abs=0.2693866491317749, max_abs=1.125, mean_rel=0.08351200073957443, max_rel=4.211202621459961, norm_rel=0.017922917380928993, ref_abs_avg=14.930041313171387, test_abs_avg=14.929931640625
production_forward grad[96] vs paper_forward: mean_abs=0.3729589879512787, max_abs=4.75, mean_rel=0.12668712437152863, max_rel=635.4656372070312, norm_rel=0.020969053730368614, ref_abs_avg=18.070411682128906, test_abs_avg=18.06991195678711
production_forward grad[97] vs paper_forward: mean_abs=0.329145610332489, max_abs=3.0, mean_rel=0.16953717172145844, max_rel=2062.5, norm_rel=0.01921805553138256, ref_abs_avg=17.353620529174805, test_abs_avg=17.33956527709961
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016362924361601472, max_abs=0.05078125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008660978637635708, max_abs=0.4296875, mean_rel=0.07525289058685303, max_rel=136.15968322753906, norm_rel=0.02054995857179165, ref_abs_avg=0.45441845059394836, test_abs_avg=0.4544254541397095
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.343995571136475, max_abs=64.0, mean_rel=0.13042554259300232, max_rel=132.57749938964844, norm_rel=0.020473113283514977, ref_abs_avg=314.8078308105469, test_abs_avg=314.7884216308594
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3056516647338867, max_abs=5.25, mean_rel=0.15872539579868317, max_rel=22.96299171447754, norm_rel=0.02406778745353222, ref_abs_avg=54.17042541503906, test_abs_avg=54.16938781738281
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6182987689971924, max_abs=11.0, mean_rel=0.1678583025932312, max_rel=1269.084716796875, norm_rel=0.0252169668674469, ref_abs_avg=64.59895324707031, test_abs_avg=64.59685516357422
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5066453218460083, max_abs=8.5, mean_rel=0.45825034379959106, max_rel=5749.99951171875, norm_rel=0.02386699989438057, ref_abs_avg=63.53062438964844, test_abs_avg=63.527862548828125
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.181626796722412, max_abs=3.75, mean_rel=0.37763282656669617, max_rel=138.0697784423828, norm_rel=0.024953922256827354, ref_abs_avg=45.64159393310547, test_abs_avg=45.46211242675781
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4227111339569092, max_abs=10.03125, mean_rel=0.1599298119544983, max_rel=1199.8314208984375, norm_rel=0.025018569082021713, ref_abs_avg=57.282318115234375, test_abs_avg=57.28192138671875
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3105502128601074, max_abs=7.5, mean_rel=0.4209594130516052, max_rel=4250.0, norm_rel=0.023334940895438194, ref_abs_avg=56.54450225830078, test_abs_avg=56.54875183105469
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0334439277648926, max_abs=4.5, mean_rel=0.12193770706653595, max_rel=12.741854667663574, norm_rel=0.024216631427407265, ref_abs_avg=43.430355072021484, test_abs_avg=43.36289978027344
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3002891540527344, max_abs=10.5, mean_rel=0.1629944145679474, max_rel=1551.9847412109375, norm_rel=0.02472485788166523, ref_abs_avg=52.95016098022461, test_abs_avg=52.94516372680664
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1962790489196777, max_abs=7.34375, mean_rel=0.4406841993331909, max_rel=4437.5, norm_rel=0.02300683967769146, ref_abs_avg=52.29153823852539, test_abs_avg=52.283172607421875
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9585304260253906, max_abs=3.53125, mean_rel=0.09262843430042267, max_rel=5.122297286987305, norm_rel=0.02314823679625988, ref_abs_avg=41.679283142089844, test_abs_avg=41.6728515625
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2038575410842896, max_abs=9.5, mean_rel=0.18237604200839996, max_rel=2549.43896484375, norm_rel=0.024504372850060463, ref_abs_avg=49.442909240722656, test_abs_avg=49.440673828125
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1046357154846191, max_abs=6.5, mean_rel=0.30424416065216064, max_rel=3312.499755859375, norm_rel=0.022856611758470535, ref_abs_avg=48.61642837524414, test_abs_avg=48.62352752685547
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8898831605911255, max_abs=3.5, mean_rel=0.10864640772342682, max_rel=20.457632064819336, norm_rel=0.021817749366164207, ref_abs_avg=40.14910888671875, test_abs_avg=40.18754196166992
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1266084909439087, max_abs=7.0, mean_rel=0.16666993498802185, max_rel=1755.0411376953125, norm_rel=0.024412741884589195, ref_abs_avg=46.412269592285156, test_abs_avg=46.41088104248047
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0384995937347412, max_abs=6.25, mean_rel=0.3620237112045288, max_rel=3031.249755859375, norm_rel=0.022825468331575394, ref_abs_avg=45.70769500732422, test_abs_avg=45.70154571533203
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.863419771194458, max_abs=3.265625, mean_rel=0.08573368191719055, max_rel=4.131057262420654, norm_rel=0.02333734557032585, ref_abs_avg=37.31761932373047, test_abs_avg=37.40165710449219
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0557507276535034, max_abs=7.0, mean_rel=0.17746329307556152, max_rel=1882.4676513671875, norm_rel=0.024223899468779564, ref_abs_avg=43.81536865234375, test_abs_avg=43.81602096557617
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.970282256603241, max_abs=6.0, mean_rel=0.32777899503707886, max_rel=3031.249755859375, norm_rel=0.02240835130214691, ref_abs_avg=43.551422119140625, test_abs_avg=43.551292419433594
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7517662048339844, max_abs=3.5, mean_rel=0.10969621688127518, max_rel=14.574570655822754, norm_rel=0.021535668522119522, ref_abs_avg=35.294403076171875, test_abs_avg=35.37925720214844
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0022988319396973, max_abs=6.5, mean_rel=0.16264569759368896, max_rel=1979.420166015625, norm_rel=0.02406897209584713, ref_abs_avg=41.891334533691406, test_abs_avg=41.89195251464844
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9165660738945007, max_abs=5.75, mean_rel=0.3070710301399231, max_rel=2749.999755859375, norm_rel=0.022181978449225426, ref_abs_avg=41.529502868652344, test_abs_avg=41.53009796142578
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7222514152526855, max_abs=2.90625, mean_rel=0.12506434321403503, max_rel=16.031936645507812, norm_rel=0.022108497098088264, ref_abs_avg=33.006980895996094, test_abs_avg=32.92559051513672
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.953040599822998, max_abs=6.0, mean_rel=0.16288113594055176, max_rel=2366.91357421875, norm_rel=0.023831220343708992, ref_abs_avg=40.20010757446289, test_abs_avg=40.20301055908203
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8772127628326416, max_abs=5.75, mean_rel=0.35191434621810913, max_rel=4375.0, norm_rel=0.02221904695034027, ref_abs_avg=39.634490966796875, test_abs_avg=39.642356872558594
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8789198398590088, max_abs=3.5, mean_rel=0.13665451109409332, max_rel=16.313308715820312, norm_rel=0.024915574118494987, ref_abs_avg=34.43836975097656, test_abs_avg=34.490928649902344
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.098718285560608, max_abs=8.0, mean_rel=0.18375727534294128, max_rel=2226.976318359375, norm_rel=0.0257307980209589, ref_abs_avg=42.913673400878906, test_abs_avg=42.916053771972656
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0132761001586914, max_abs=6.3125, mean_rel=0.34374791383743286, max_rel=2687.499755859375, norm_rel=0.02406185306608677, ref_abs_avg=42.272499084472656, test_abs_avg=42.27344512939453
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8134760856628418, max_abs=3.5, mean_rel=0.15258517861366272, max_rel=13.972524642944336, norm_rel=0.02457721158862114, ref_abs_avg=32.768577575683594, test_abs_avg=32.786285400390625
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0213780403137207, max_abs=7.0, mean_rel=0.17054125666618347, max_rel=2542.31298828125, norm_rel=0.02607867866754532, ref_abs_avg=39.30220031738281, test_abs_avg=39.306304931640625
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9509742259979248, max_abs=6.0, mean_rel=0.33589106798171997, max_rel=2687.499755859375, norm_rel=0.024677982553839684, ref_abs_avg=38.66518783569336, test_abs_avg=38.66941452026367
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7537223100662231, max_abs=3.0, mean_rel=0.3583478629589081, max_rel=134.4708709716797, norm_rel=0.025758972391486168, ref_abs_avg=29.249446868896484, test_abs_avg=29.301603317260742
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9376161694526672, max_abs=6.75, mean_rel=0.17744368314743042, max_rel=1156.219482421875, norm_rel=0.025925077497959137, ref_abs_avg=36.30282211303711, test_abs_avg=36.30241394042969
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8820920586585999, max_abs=5.5, mean_rel=0.3202134668827057, max_rel=3593.749755859375, norm_rel=0.024413112550973892, ref_abs_avg=36.244895935058594, test_abs_avg=36.242427825927734
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7248239517211914, max_abs=3.25, mean_rel=0.1587645709514618, max_rel=16.636770248413086, norm_rel=0.02504785545170307, ref_abs_avg=28.26502799987793, test_abs_avg=28.34304428100586
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8870019912719727, max_abs=6.0, mean_rel=0.17233890295028687, max_rel=1608.4486083984375, norm_rel=0.025640761479735374, ref_abs_avg=34.75468826293945, test_abs_avg=34.75785827636719
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8290377855300903, max_abs=6.0, mean_rel=0.347728967666626, max_rel=2624.999755859375, norm_rel=0.02438175491988659, ref_abs_avg=34.067928314208984, test_abs_avg=34.07307434082031
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.61903977394104, max_abs=3.125, mean_rel=0.07935947179794312, max_rel=4.234306335449219, norm_rel=0.022479385137557983, ref_abs_avg=28.468690872192383, test_abs_avg=28.4835147857666
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.836767315864563, max_abs=6.0, mean_rel=0.1625739336013794, max_rel=1495.7647705078125, norm_rel=0.02543383091688156, ref_abs_avg=33.04030227661133, test_abs_avg=33.04085922241211
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.78106689453125, max_abs=4.875, mean_rel=0.24917331337928772, max_rel=3281.249755859375, norm_rel=0.023972833529114723, ref_abs_avg=32.67892837524414, test_abs_avg=32.682716369628906
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6503667831420898, max_abs=2.625, mean_rel=0.12060337513685226, max_rel=11.743595123291016, norm_rel=0.023395445197820663, ref_abs_avg=27.595428466796875, test_abs_avg=27.552494049072266
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7972822785377502, max_abs=7.0, mean_rel=0.16888871788978577, max_rel=1100.296630859375, norm_rel=0.02501721680164337, ref_abs_avg=31.95787811279297, test_abs_avg=31.958616256713867
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7422366738319397, max_abs=5.375, mean_rel=0.24367277324199677, max_rel=1999.9998779296875, norm_rel=0.023832708597183228, ref_abs_avg=31.2647762298584, test_abs_avg=31.268470764160156
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6213312149047852, max_abs=2.25, mean_rel=0.0655876025557518, max_rel=3.2559831142425537, norm_rel=0.023743513971567154, ref_abs_avg=26.227962493896484, test_abs_avg=26.178974151611328
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7536134719848633, max_abs=5.0, mean_rel=0.1626080721616745, max_rel=858.8258056640625, norm_rel=0.024957600980997086, ref_abs_avg=30.270509719848633, test_abs_avg=30.272254943847656
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7014805674552917, max_abs=4.0625, mean_rel=0.28315991163253784, max_rel=1906.2498779296875, norm_rel=0.02359316498041153, ref_abs_avg=29.794906616210938, test_abs_avg=29.802709579467773
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5434625148773193, max_abs=2.25, mean_rel=0.10142628848552704, max_rel=5.044223308563232, norm_rel=0.0226884912699461, ref_abs_avg=24.01276969909668, test_abs_avg=24.015037536621094
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7247745990753174, max_abs=5.5, mean_rel=0.15657612681388855, max_rel=788.898681640625, norm_rel=0.02460235171020031, ref_abs_avg=29.52189826965332, test_abs_avg=29.523284912109375
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6765789985656738, max_abs=4.546875, mean_rel=0.33369413018226624, max_rel=2437.5, norm_rel=0.0231012050062418, ref_abs_avg=29.35482406616211, test_abs_avg=29.35512924194336
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.607446551322937, max_abs=2.5, mean_rel=0.23947536945343018, max_rel=32.983646392822266, norm_rel=0.023624012246727943, ref_abs_avg=25.638328552246094, test_abs_avg=25.6323299407959
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8212504386901855, max_abs=6.0, mean_rel=0.1731703281402588, max_rel=1428.464111328125, norm_rel=0.026519281789660454, ref_abs_avg=31.053377151489258, test_abs_avg=31.055360794067383
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.755200982093811, max_abs=5.5, mean_rel=0.30408430099487305, max_rel=2375.0, norm_rel=0.024726424366235733, ref_abs_avg=30.655466079711914, test_abs_avg=30.65290641784668
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5893983840942383, max_abs=2.296875, mean_rel=0.30438995361328125, max_rel=97.33314514160156, norm_rel=0.02414802834391594, ref_abs_avg=24.42874526977539, test_abs_avg=24.441844940185547
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.742159903049469, max_abs=5.25, mean_rel=0.17906461656093597, max_rel=937.77880859375, norm_rel=0.025801286101341248, ref_abs_avg=28.868560791015625, test_abs_avg=28.86962890625
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6942598819732666, max_abs=4.25, mean_rel=0.28220927715301514, max_rel=1984.3748779296875, norm_rel=0.024580776691436768, ref_abs_avg=28.32145881652832, test_abs_avg=28.316360473632812
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5386109352111816, max_abs=2.25, mean_rel=0.24504008889198303, max_rel=48.053340911865234, norm_rel=0.02477133460342884, ref_abs_avg=22.281295776367188, test_abs_avg=22.3317813873291
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.686309814453125, max_abs=4.5, mean_rel=0.16652333736419678, max_rel=1374.1427001953125, norm_rel=0.025224903598427773, ref_abs_avg=27.271678924560547, test_abs_avg=27.272357940673828
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6413107514381409, max_abs=4.5, mean_rel=0.26709726452827454, max_rel=2125.0, norm_rel=0.023674611002206802, ref_abs_avg=27.106552124023438, test_abs_avg=27.10674285888672
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4785609245300293, max_abs=1.921875, mean_rel=0.08194401115179062, max_rel=3.499669313430786, norm_rel=0.02114592120051384, ref_abs_avg=22.71902084350586, test_abs_avg=22.74578857421875
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6530255079269409, max_abs=5.5, mean_rel=0.15911561250686646, max_rel=1415.3599853515625, norm_rel=0.02487608790397644, ref_abs_avg=26.239456176757812, test_abs_avg=26.23929214477539
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6036359071731567, max_abs=4.5, mean_rel=0.2486495077610016, max_rel=1687.4998779296875, norm_rel=0.023431317880749702, ref_abs_avg=25.771766662597656, test_abs_avg=25.766820907592773
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4800910949707031, max_abs=1.875, mean_rel=0.07671448588371277, max_rel=5.431502819061279, norm_rel=0.02316553145647049, ref_abs_avg=20.66823387145996, test_abs_avg=20.65212059020996
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6119776964187622, max_abs=5.5, mean_rel=0.15841710567474365, max_rel=1421.273193359375, norm_rel=0.024502720683813095, ref_abs_avg=24.959190368652344, test_abs_avg=24.96002197265625
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5677240490913391, max_abs=4.75, mean_rel=0.2318144142627716, max_rel=2312.5, norm_rel=0.022786010056734085, ref_abs_avg=25.00151824951172, test_abs_avg=25.0048828125
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4422011375427246, max_abs=1.75, mean_rel=0.1315576136112213, max_rel=26.347488403320312, norm_rel=0.022709766402840614, ref_abs_avg=19.801610946655273, test_abs_avg=19.80999755859375
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5792577266693115, max_abs=5.0, mean_rel=0.15376216173171997, max_rel=1632.9481201171875, norm_rel=0.024014335125684738, ref_abs_avg=24.149688720703125, test_abs_avg=24.15005874633789
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5355680584907532, max_abs=4.0, mean_rel=0.22580967843532562, max_rel=1843.7498779296875, norm_rel=0.02218332327902317, ref_abs_avg=24.06630516052246, test_abs_avg=24.073144912719727
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.44222477078437805, max_abs=1.9375, mean_rel=0.12655006349086761, max_rel=9.320154190063477, norm_rel=0.022639524191617966, ref_abs_avg=19.509904861450195, test_abs_avg=19.492956161499023
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5509800314903259, max_abs=5.0, mean_rel=0.15130449831485748, max_rel=924.2694091796875, norm_rel=0.02379564940929413, ref_abs_avg=23.191558837890625, test_abs_avg=23.191221237182617
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5147429704666138, max_abs=4.25, mean_rel=0.20131796598434448, max_rel=1781.2498779296875, norm_rel=0.022139616310596466, ref_abs_avg=23.200538635253906, test_abs_avg=23.204200744628906
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.40148258209228516, max_abs=1.4375, mean_rel=0.13097554445266724, max_rel=25.760643005371094, norm_rel=0.021178128197789192, ref_abs_avg=18.65290641784668, test_abs_avg=18.620677947998047
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5319697260856628, max_abs=4.5, mean_rel=0.14500382542610168, max_rel=748.6931762695312, norm_rel=0.023092765361070633, ref_abs_avg=23.017032623291016, test_abs_avg=23.016685485839844
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4798606038093567, max_abs=3.5, mean_rel=0.22811666131019592, max_rel=1687.4998779296875, norm_rel=0.021448373794555664, ref_abs_avg=22.395889282226562, test_abs_avg=22.3976993560791
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4435077905654907, max_abs=1.53125, mean_rel=0.11036857217550278, max_rel=7.727236747741699, norm_rel=0.022432243451476097, ref_abs_avg=20.05221176147461, test_abs_avg=20.09811019897461
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5817424654960632, max_abs=5.0, mean_rel=0.159222811460495, max_rel=951.1388549804688, norm_rel=0.024780407547950745, ref_abs_avg=23.52664566040039, test_abs_avg=23.526302337646484
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5466957092285156, max_abs=4.0, mean_rel=0.23312492668628693, max_rel=1617.1873779296875, norm_rel=0.02331019751727581, ref_abs_avg=23.46246337890625, test_abs_avg=23.461517333984375
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.41034626960754395, max_abs=1.9375, mean_rel=0.24497807025909424, max_rel=55.97063446044922, norm_rel=0.021227169781923294, ref_abs_avg=19.21063804626465, test_abs_avg=19.210224151611328
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5386167168617249, max_abs=5.0, mean_rel=0.1513577252626419, max_rel=1405.6463623046875, norm_rel=0.0241271760314703, ref_abs_avg=22.370532989501953, test_abs_avg=22.37106704711914
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.49721723794937134, max_abs=3.75, mean_rel=0.2872718572616577, max_rel=2187.5, norm_rel=0.02278401516377926, ref_abs_avg=21.85501480102539, test_abs_avg=21.85199546813965
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.38495302200317383, max_abs=1.75, mean_rel=0.15723174810409546, max_rel=38.869224548339844, norm_rel=0.022842291742563248, ref_abs_avg=16.17538833618164, test_abs_avg=16.177875518798828
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.4969022572040558, max_abs=4.0, mean_rel=0.14933590590953827, max_rel=730.9261474609375, norm_rel=0.023523027077317238, ref_abs_avg=21.172136306762695, test_abs_avg=21.17337417602539
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4623124301433563, max_abs=3.75, mean_rel=0.2343204915523529, max_rel=1593.7498779296875, norm_rel=0.02225322648882866, ref_abs_avg=20.885786056518555, test_abs_avg=20.879762649536133
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.34480857849121094, max_abs=1.3125, mean_rel=0.14975036680698395, max_rel=30.123424530029297, norm_rel=0.020994722843170166, ref_abs_avg=16.810569763183594, test_abs_avg=16.82654571533203
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.46541455388069153, max_abs=4.75, mean_rel=0.14230456948280334, max_rel=906.4537963867188, norm_rel=0.02304261550307274, ref_abs_avg=20.239524841308594, test_abs_avg=20.240943908691406
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.42803633213043213, max_abs=4.25, mean_rel=0.2003186047077179, max_rel=1062.5, norm_rel=0.02165435254573822, ref_abs_avg=19.9625244140625, test_abs_avg=19.959138870239258
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.33037471771240234, max_abs=1.3125, mean_rel=0.08058637380599976, max_rel=3.3917839527130127, norm_rel=0.020142672583460808, ref_abs_avg=16.544313430786133, test_abs_avg=16.572017669677734
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4382360279560089, max_abs=4.0, mean_rel=0.14065608382225037, max_rel=808.498291015625, norm_rel=0.022518960759043694, ref_abs_avg=19.559785842895508, test_abs_avg=19.559772491455078
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4019927978515625, max_abs=3.84375, mean_rel=0.18834443390369415, max_rel=1125.0, norm_rel=0.021074356511235237, ref_abs_avg=19.25068473815918, test_abs_avg=19.24003028869629
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3182905912399292, max_abs=1.125, mean_rel=0.12961770594120026, max_rel=18.71052360534668, norm_rel=0.020463932305574417, ref_abs_avg=15.6720552444458, test_abs_avg=15.685938835144043
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.41778796911239624, max_abs=4.0, mean_rel=0.13295084238052368, max_rel=525.565673828125, norm_rel=0.022060031071305275, ref_abs_avg=19.07494354248047, test_abs_avg=19.074275970458984
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3773112893104553, max_abs=3.5, mean_rel=0.18788456916809082, max_rel=1562.4998779296875, norm_rel=0.019989343360066414, ref_abs_avg=19.044330596923828, test_abs_avg=19.034427642822266
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30088019371032715, max_abs=1.375, mean_rel=0.06944234669208527, max_rel=6.448783874511719, norm_rel=0.01967288926243782, ref_abs_avg=16.245655059814453, test_abs_avg=16.244029998779297
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3926889896392822, max_abs=4.5, mean_rel=0.1325758993625641, max_rel=806.4028930664062, norm_rel=0.021427948027849197, ref_abs_avg=18.552532196044922, test_abs_avg=18.55344581604004
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3555677533149719, max_abs=3.0, mean_rel=0.16935160756111145, max_rel=1281.25, norm_rel=0.01958988793194294, ref_abs_avg=18.391395568847656, test_abs_avg=18.395305633544922
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.28503119945526123, max_abs=1.375, mean_rel=0.11281095445156097, max_rel=12.441933631896973, norm_rel=0.018865900114178658, ref_abs_avg=14.930041313171387, test_abs_avg=14.93899154663086
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.37222719192504883, max_abs=5.75, mean_rel=0.1271689534187317, max_rel=785.9574584960938, norm_rel=0.020948639139533043, ref_abs_avg=18.070411682128906, test_abs_avg=18.069900512695312
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32847318053245544, max_abs=3.0, mean_rel=0.17958535254001617, max_rel=1718.7498779296875, norm_rel=0.019193269312381744, ref_abs_avg=17.353620529174805, test_abs_avg=17.342554092407227
production_forward2 vs paper_forward output: mean_abs=0.0016344812465831637, max_abs=0.0546875
production_forward2 grad[0] vs paper_forward: mean_abs=0.008324876427650452, max_abs=0.4921875, mean_rel=0.07264275848865509, max_rel=283.4215393066406, norm_rel=0.019882801920175552, ref_abs_avg=0.45441845059394836, test_abs_avg=0.45443838834762573
production_forward2 grad[1] vs paper_forward: mean_abs=7.209749698638916, max_abs=56.0, mean_rel=0.12589143216609955, max_rel=109.22783660888672, norm_rel=0.020156383514404297, ref_abs_avg=314.8078308105469, test_abs_avg=314.8313293457031
production_forward2 grad[2] vs paper_forward: mean_abs=1.2054100036621094, max_abs=5.0, mean_rel=0.16519300639629364, max_rel=26.802377700805664, norm_rel=0.022662343457341194, ref_abs_avg=54.17042541503906, test_abs_avg=54.13069152832031
production_forward2 grad[3] vs paper_forward: mean_abs=1.572697639465332, max_abs=11.5, mean_rel=0.16635273396968842, max_rel=1708.16064453125, norm_rel=0.0245023462921381, ref_abs_avg=64.59895324707031, test_abs_avg=64.60001373291016
production_forward2 grad[4] vs paper_forward: mean_abs=1.4564557075500488, max_abs=9.0, mean_rel=0.4540036916732788, max_rel=5687.49951171875, norm_rel=0.023081105202436447, ref_abs_avg=63.53062438964844, test_abs_avg=63.530738830566406
production_forward2 grad[5] vs paper_forward: mean_abs=1.1565048694610596, max_abs=3.75, mean_rel=0.3264463245868683, max_rel=118.48311614990234, norm_rel=0.0243802759796381, ref_abs_avg=45.64159393310547, test_abs_avg=45.512184143066406
production_forward2 grad[6] vs paper_forward: mean_abs=1.382358431816101, max_abs=10.0, mean_rel=0.16395172476768494, max_rel=2476.4873046875, norm_rel=0.024326996877789497, ref_abs_avg=57.282318115234375, test_abs_avg=57.28483200073242
production_forward2 grad[7] vs paper_forward: mean_abs=1.26904296875, max_abs=8.0, mean_rel=0.41066139936447144, max_rel=4625.0, norm_rel=0.02258337289094925, ref_abs_avg=56.54450225830078, test_abs_avg=56.55049133300781
production_forward2 grad[8] vs paper_forward: mean_abs=1.019148349761963, max_abs=4.375, mean_rel=0.1131591945886612, max_rel=12.49057674407959, norm_rel=0.023569481447339058, ref_abs_avg=43.430355072021484, test_abs_avg=43.40074157714844
production_forward2 grad[9] vs paper_forward: mean_abs=1.2644119262695312, max_abs=9.0, mean_rel=0.15877923369407654, max_rel=899.4017333984375, norm_rel=0.024045972153544426, ref_abs_avg=52.95016098022461, test_abs_avg=52.94877624511719
production_forward2 grad[10] vs paper_forward: mean_abs=1.1589603424072266, max_abs=6.6875, mean_rel=0.4415279030799866, max_rel=4312.5, norm_rel=0.0222710482776165, ref_abs_avg=52.29153823852539, test_abs_avg=52.28376007080078
production_forward2 grad[11] vs paper_forward: mean_abs=0.9520168304443359, max_abs=3.28125, mean_rel=0.09387747943401337, max_rel=4.666011810302734, norm_rel=0.022734716534614563, ref_abs_avg=41.679283142089844, test_abs_avg=41.70176696777344
production_forward2 grad[12] vs paper_forward: mean_abs=1.1724166870117188, max_abs=10.0, mean_rel=0.17474539577960968, max_rel=2061.329345703125, norm_rel=0.023879561573266983, ref_abs_avg=49.442909240722656, test_abs_avg=49.442657470703125
production_forward2 grad[13] vs paper_forward: mean_abs=1.0802733898162842, max_abs=6.625, mean_rel=0.32799994945526123, max_rel=3406.249755859375, norm_rel=0.022349948063492775, ref_abs_avg=48.61642837524414, test_abs_avg=48.620941162109375
production_forward2 grad[14] vs paper_forward: mean_abs=0.8798007965087891, max_abs=3.375, mean_rel=0.11107034981250763, max_rel=21.966724395751953, norm_rel=0.021704459562897682, ref_abs_avg=40.14910888671875, test_abs_avg=40.170005798339844
production_forward2 grad[15] vs paper_forward: mean_abs=1.0985356569290161, max_abs=8.0, mean_rel=0.1639106720685959, max_rel=1756.24853515625, norm_rel=0.023814469575881958, ref_abs_avg=46.412269592285156, test_abs_avg=46.412864685058594
production_forward2 grad[16] vs paper_forward: mean_abs=1.0129797458648682, max_abs=6.75, mean_rel=0.31035280227661133, max_rel=3374.999755859375, norm_rel=0.02226846292614937, ref_abs_avg=45.70769500732422, test_abs_avg=45.70228576660156
production_forward2 grad[17] vs paper_forward: mean_abs=0.8211588859558105, max_abs=3.328125, mean_rel=0.08413796126842499, max_rel=4.783606052398682, norm_rel=0.02208445779979229, ref_abs_avg=37.31761932373047, test_abs_avg=37.387184143066406
production_forward2 grad[18] vs paper_forward: mean_abs=1.0314135551452637, max_abs=7.0, mean_rel=0.17480459809303284, max_rel=1756.1866455078125, norm_rel=0.02369278483092785, ref_abs_avg=43.81536865234375, test_abs_avg=43.81806564331055
production_forward2 grad[19] vs paper_forward: mean_abs=0.9454120397567749, max_abs=6.5, mean_rel=0.32190483808517456, max_rel=3281.249755859375, norm_rel=0.021848997101187706, ref_abs_avg=43.551422119140625, test_abs_avg=43.55466079711914
production_forward2 grad[20] vs paper_forward: mean_abs=0.7588176727294922, max_abs=3.5, mean_rel=0.11673389375209808, max_rel=15.264082908630371, norm_rel=0.022033562883734703, ref_abs_avg=35.294403076171875, test_abs_avg=35.36418533325195
production_forward2 grad[21] vs paper_forward: mean_abs=0.9814146757125854, max_abs=6.0, mean_rel=0.1588641107082367, max_rel=2040.828125, norm_rel=0.02357647940516472, ref_abs_avg=41.891334533691406, test_abs_avg=41.89385223388672
production_forward2 grad[22] vs paper_forward: mean_abs=0.8964749574661255, max_abs=5.0, mean_rel=0.26485905051231384, max_rel=2375.0, norm_rel=0.021682431921362877, ref_abs_avg=41.529502868652344, test_abs_avg=41.52964401245117
production_forward2 grad[23] vs paper_forward: mean_abs=0.703031063079834, max_abs=2.5390625, mean_rel=0.1194927766919136, max_rel=14.6292142868042, norm_rel=0.021398894488811493, ref_abs_avg=33.006980895996094, test_abs_avg=32.95063400268555
production_forward2 grad[24] vs paper_forward: mean_abs=0.9345271587371826, max_abs=7.0, mean_rel=0.1556698977947235, max_rel=2506.144287109375, norm_rel=0.02338619902729988, ref_abs_avg=40.20010757446289, test_abs_avg=40.203887939453125
production_forward2 grad[25] vs paper_forward: mean_abs=0.8575114011764526, max_abs=5.375, mean_rel=0.3245236873626709, max_rel=3187.499755859375, norm_rel=0.02173038385808468, ref_abs_avg=39.634490966796875, test_abs_avg=39.641727447509766
production_forward2 grad[26] vs paper_forward: mean_abs=0.8625357151031494, max_abs=3.5, mean_rel=0.15501615405082703, max_rel=18.392236709594727, norm_rel=0.024823257699608803, ref_abs_avg=34.43836975097656, test_abs_avg=34.44978713989258
production_forward2 grad[27] vs paper_forward: mean_abs=1.075392246246338, max_abs=7.0, mean_rel=0.1760292500257492, max_rel=2293.4619140625, norm_rel=0.025202011689543724, ref_abs_avg=42.913673400878906, test_abs_avg=42.91542434692383
production_forward2 grad[28] vs paper_forward: mean_abs=0.9958857297897339, max_abs=5.75, mean_rel=0.3264125883579254, max_rel=2812.499755859375, norm_rel=0.023668572306632996, ref_abs_avg=42.272499084472656, test_abs_avg=42.27351379394531
production_forward2 grad[29] vs paper_forward: mean_abs=0.7913985252380371, max_abs=3.5, mean_rel=0.13226750493049622, max_rel=13.093015670776367, norm_rel=0.023747609928250313, ref_abs_avg=32.768577575683594, test_abs_avg=32.79868698120117
production_forward2 grad[30] vs paper_forward: mean_abs=1.002370834350586, max_abs=6.5, mean_rel=0.1618727594614029, max_rel=1903.4202880859375, norm_rel=0.025619998574256897, ref_abs_avg=39.30220031738281, test_abs_avg=39.304649353027344
production_forward2 grad[31] vs paper_forward: mean_abs=0.9302586317062378, max_abs=5.5, mean_rel=0.3399850130081177, max_rel=3406.249755859375, norm_rel=0.024146493524312973, ref_abs_avg=38.66518783569336, test_abs_avg=38.67036819458008
production_forward2 grad[32] vs paper_forward: mean_abs=0.7158082723617554, max_abs=3.09375, mean_rel=0.30337610840797424, max_rel=109.43256378173828, norm_rel=0.024462031200528145, ref_abs_avg=29.249446868896484, test_abs_avg=29.28200340270996
production_forward2 grad[33] vs paper_forward: mean_abs=0.9207857251167297, max_abs=7.0, mean_rel=0.17172756791114807, max_rel=1243.12060546875, norm_rel=0.0254783034324646, ref_abs_avg=36.30282211303711, test_abs_avg=36.304569244384766
production_forward2 grad[34] vs paper_forward: mean_abs=0.8645457625389099, max_abs=5.5625, mean_rel=0.29668128490448, max_rel=2749.999755859375, norm_rel=0.023919319733977318, ref_abs_avg=36.244895935058594, test_abs_avg=36.245506286621094
production_forward2 grad[35] vs paper_forward: mean_abs=0.6962201595306396, max_abs=3.3125, mean_rel=0.13154925405979156, max_rel=17.695405960083008, norm_rel=0.02448827400803566, ref_abs_avg=28.26502799987793, test_abs_avg=28.337108612060547
production_forward2 grad[36] vs paper_forward: mean_abs=0.8721567392349243, max_abs=6.0, mean_rel=0.17605531215667725, max_rel=1886.6405029296875, norm_rel=0.025220785290002823, ref_abs_avg=34.75468826293945, test_abs_avg=34.75828552246094
production_forward2 grad[37] vs paper_forward: mean_abs=0.8132742047309875, max_abs=6.25, mean_rel=0.33419597148895264, max_rel=2562.5, norm_rel=0.02393346093595028, ref_abs_avg=34.067928314208984, test_abs_avg=34.075496673583984
production_forward2 grad[38] vs paper_forward: mean_abs=0.6339473724365234, max_abs=3.125, mean_rel=0.0910908430814743, max_rel=5.149725437164307, norm_rel=0.022774316370487213, ref_abs_avg=28.468690872192383, test_abs_avg=28.457901000976562
production_forward2 grad[39] vs paper_forward: mean_abs=0.8259449005126953, max_abs=6.5, mean_rel=0.15975156426429749, max_rel=1376.2867431640625, norm_rel=0.025098854675889015, ref_abs_avg=33.04030227661133, test_abs_avg=33.041046142578125
production_forward2 grad[40] vs paper_forward: mean_abs=0.7681624889373779, max_abs=5.875, mean_rel=0.2530667185783386, max_rel=3531.249755859375, norm_rel=0.023578159511089325, ref_abs_avg=32.67892837524414, test_abs_avg=32.6832389831543
production_forward2 grad[41] vs paper_forward: mean_abs=0.6363635063171387, max_abs=2.6875, mean_rel=0.13422314822673798, max_rel=24.827367782592773, norm_rel=0.02332434244453907, ref_abs_avg=27.595428466796875, test_abs_avg=27.55910873413086
production_forward2 grad[42] vs paper_forward: mean_abs=0.78684401512146, max_abs=6.0, mean_rel=0.17024719715118408, max_rel=1536.8748779296875, norm_rel=0.024691978469491005, ref_abs_avg=31.95787811279297, test_abs_avg=31.960540771484375
production_forward2 grad[43] vs paper_forward: mean_abs=0.7301410436630249, max_abs=5.0, mean_rel=0.23830009996891022, max_rel=2187.5, norm_rel=0.023437952622771263, ref_abs_avg=31.2647762298584, test_abs_avg=31.2667236328125
production_forward2 grad[44] vs paper_forward: mean_abs=0.6108589172363281, max_abs=2.25, mean_rel=0.0734115019440651, max_rel=3.2559831142425537, norm_rel=0.023268800228834152, ref_abs_avg=26.227962493896484, test_abs_avg=26.17993927001953
production_forward2 grad[45] vs paper_forward: mean_abs=0.744001030921936, max_abs=5.5, mean_rel=0.16227677464485168, max_rel=1437.3638916015625, norm_rel=0.024641405791044235, ref_abs_avg=30.270509719848633, test_abs_avg=30.27100372314453
production_forward2 grad[46] vs paper_forward: mean_abs=0.6921640038490295, max_abs=4.25, mean_rel=0.27829062938690186, max_rel=2093.75, norm_rel=0.023280732333660126, ref_abs_avg=29.794906616210938, test_abs_avg=29.802440643310547
production_forward2 grad[47] vs paper_forward: mean_abs=0.540158748626709, max_abs=2.25, mean_rel=0.1023159921169281, max_rel=4.426397323608398, norm_rel=0.022706078365445137, ref_abs_avg=24.01276969909668, test_abs_avg=24.02025604248047
production_forward2 grad[48] vs paper_forward: mean_abs=0.716802716255188, max_abs=5.0, mean_rel=0.1498228907585144, max_rel=502.23095703125, norm_rel=0.024346409365534782, ref_abs_avg=29.52189826965332, test_abs_avg=29.523279190063477
production_forward2 grad[49] vs paper_forward: mean_abs=0.6693341732025146, max_abs=5.0, mean_rel=0.3275161385536194, max_rel=2562.5, norm_rel=0.022858085110783577, ref_abs_avg=29.35482406616211, test_abs_avg=29.356910705566406
production_forward2 grad[50] vs paper_forward: mean_abs=0.5915709733963013, max_abs=2.5625, mean_rel=0.1588262915611267, max_rel=23.491161346435547, norm_rel=0.02314768172800541, ref_abs_avg=25.638328552246094, test_abs_avg=25.646018981933594
production_forward2 grad[51] vs paper_forward: mean_abs=0.809301495552063, max_abs=6.0, mean_rel=0.1691647171974182, max_rel=1418.4774169921875, norm_rel=0.026158466935157776, ref_abs_avg=31.053377151489258, test_abs_avg=31.05554962158203
production_forward2 grad[52] vs paper_forward: mean_abs=0.7472457885742188, max_abs=5.0, mean_rel=0.30112558603286743, max_rel=2375.0, norm_rel=0.024439403787255287, ref_abs_avg=30.655466079711914, test_abs_avg=30.657625198364258
production_forward2 grad[53] vs paper_forward: mean_abs=0.5897393226623535, max_abs=2.5, mean_rel=0.3062290847301483, max_rel=93.3399887084961, norm_rel=0.024296654388308525, ref_abs_avg=24.42874526977539, test_abs_avg=24.451419830322266
production_forward2 grad[54] vs paper_forward: mean_abs=0.7325382828712463, max_abs=5.5, mean_rel=0.17428700625896454, max_rel=1026.299072265625, norm_rel=0.025451937690377235, ref_abs_avg=28.868560791015625, test_abs_avg=28.869915008544922
production_forward2 grad[55] vs paper_forward: mean_abs=0.6831139326095581, max_abs=4.375, mean_rel=0.2576141357421875, max_rel=2203.125, norm_rel=0.024191802367568016, ref_abs_avg=28.32145881652832, test_abs_avg=28.3164119720459
production_forward2 grad[56] vs paper_forward: mean_abs=0.5148220062255859, max_abs=2.25, mean_rel=0.15096960961818695, max_rel=25.214630126953125, norm_rel=0.02372078225016594, ref_abs_avg=22.281295776367188, test_abs_avg=22.31824493408203
production_forward2 grad[57] vs paper_forward: mean_abs=0.6784414052963257, max_abs=5.09375, mean_rel=0.16464078426361084, max_rel=1554.12744140625, norm_rel=0.0249446090310812, ref_abs_avg=27.271678924560547, test_abs_avg=27.273178100585938
production_forward2 grad[58] vs paper_forward: mean_abs=0.6308825612068176, max_abs=4.25, mean_rel=0.259933739900589, max_rel=1843.7498779296875, norm_rel=0.023284297436475754, ref_abs_avg=27.106552124023438, test_abs_avg=27.10440444946289
production_forward2 grad[59] vs paper_forward: mean_abs=0.4772624969482422, max_abs=2.234375, mean_rel=0.08047029376029968, max_rel=3.0587661266326904, norm_rel=0.02108473889529705, ref_abs_avg=22.71902084350586, test_abs_avg=22.760189056396484
production_forward2 grad[60] vs paper_forward: mean_abs=0.6453008651733398, max_abs=5.5, mean_rel=0.15544962882995605, max_rel=1366.309326171875, norm_rel=0.02462136372923851, ref_abs_avg=26.239456176757812, test_abs_avg=26.24004554748535
production_forward2 grad[61] vs paper_forward: mean_abs=0.5954256057739258, max_abs=3.875, mean_rel=0.25092780590057373, max_rel=1531.2498779296875, norm_rel=0.02310998924076557, ref_abs_avg=25.771766662597656, test_abs_avg=25.767818450927734
production_forward2 grad[62] vs paper_forward: mean_abs=0.46674370765686035, max_abs=1.875, mean_rel=0.10095608234405518, max_rel=15.992758750915527, norm_rel=0.022342275828123093, ref_abs_avg=20.66823387145996, test_abs_avg=20.65064811706543
production_forward2 grad[63] vs paper_forward: mean_abs=0.606048583984375, max_abs=4.75, mean_rel=0.15518198907375336, max_rel=1314.84912109375, norm_rel=0.024298712611198425, ref_abs_avg=24.959190368652344, test_abs_avg=24.961204528808594
production_forward2 grad[64] vs paper_forward: mean_abs=0.5619937181472778, max_abs=4.0, mean_rel=0.24362260103225708, max_rel=1999.9998779296875, norm_rel=0.022542107850313187, ref_abs_avg=25.00151824951172, test_abs_avg=25.007972717285156
production_forward2 grad[65] vs paper_forward: mean_abs=0.43162429332733154, max_abs=1.625, mean_rel=0.14003680646419525, max_rel=29.524276733398438, norm_rel=0.02226918563246727, ref_abs_avg=19.801610946655273, test_abs_avg=19.79336166381836
production_forward2 grad[66] vs paper_forward: mean_abs=0.5744530558586121, max_abs=5.0, mean_rel=0.14972372353076935, max_rel=1051.5938720703125, norm_rel=0.02381473407149315, ref_abs_avg=24.149688720703125, test_abs_avg=24.150592803955078
production_forward2 grad[67] vs paper_forward: mean_abs=0.5317139625549316, max_abs=3.75, mean_rel=0.22657278180122375, max_rel=2125.0, norm_rel=0.022055257111787796, ref_abs_avg=24.06630516052246, test_abs_avg=24.074918746948242
production_forward2 grad[68] vs paper_forward: mean_abs=0.4469224214553833, max_abs=1.9375, mean_rel=0.13553911447525024, max_rel=12.580076217651367, norm_rel=0.02279265597462654, ref_abs_avg=19.509904861450195, test_abs_avg=19.507966995239258
production_forward2 grad[69] vs paper_forward: mean_abs=0.5455092191696167, max_abs=4.5, mean_rel=0.14785116910934448, max_rel=917.2681274414062, norm_rel=0.02358219027519226, ref_abs_avg=23.191558837890625, test_abs_avg=23.191547393798828
production_forward2 grad[70] vs paper_forward: mean_abs=0.5099676847457886, max_abs=3.5, mean_rel=0.21880516409873962, max_rel=1374.9998779296875, norm_rel=0.021929115056991577, ref_abs_avg=23.200538635253906, test_abs_avg=23.20367431640625
production_forward2 grad[71] vs paper_forward: mean_abs=0.40503787994384766, max_abs=1.625, mean_rel=0.14204567670822144, max_rel=34.29736328125, norm_rel=0.021386723965406418, ref_abs_avg=18.65290641784668, test_abs_avg=18.635963439941406
production_forward2 grad[72] vs paper_forward: mean_abs=0.5276790261268616, max_abs=5.0, mean_rel=0.14417968690395355, max_rel=761.7653198242188, norm_rel=0.022901935502886772, ref_abs_avg=23.017032623291016, test_abs_avg=23.017887115478516
production_forward2 grad[73] vs paper_forward: mean_abs=0.4760313630104065, max_abs=3.5, mean_rel=0.22402159869670868, max_rel=1562.4998779296875, norm_rel=0.021291038021445274, ref_abs_avg=22.395889282226562, test_abs_avg=22.396011352539062
production_forward2 grad[74] vs paper_forward: mean_abs=0.4329805374145508, max_abs=1.75, mean_rel=0.10480864346027374, max_rel=7.766636371612549, norm_rel=0.022212402895092964, ref_abs_avg=20.05221176147461, test_abs_avg=20.10350799560547
production_forward2 grad[75] vs paper_forward: mean_abs=0.5772093534469604, max_abs=4.5, mean_rel=0.15772730112075806, max_rel=992.3341674804688, norm_rel=0.024606775492429733, ref_abs_avg=23.52664566040039, test_abs_avg=23.52631378173828
production_forward2 grad[76] vs paper_forward: mean_abs=0.542076826095581, max_abs=4.0625, mean_rel=0.24647584557533264, max_rel=1749.9998779296875, norm_rel=0.023136893287301064, ref_abs_avg=23.46246337890625, test_abs_avg=23.459726333618164
production_forward2 grad[77] vs paper_forward: mean_abs=0.4140145182609558, max_abs=1.875, mean_rel=0.14196738600730896, max_rel=22.35970115661621, norm_rel=0.021325185894966125, ref_abs_avg=19.21063804626465, test_abs_avg=19.223167419433594
production_forward2 grad[78] vs paper_forward: mean_abs=0.5346622467041016, max_abs=6.0, mean_rel=0.14582978188991547, max_rel=1641.7615966796875, norm_rel=0.023964716121554375, ref_abs_avg=22.370532989501953, test_abs_avg=22.371593475341797
production_forward2 grad[79] vs paper_forward: mean_abs=0.4908364415168762, max_abs=3.5, mean_rel=0.2799050509929657, max_rel=2250.0, norm_rel=0.02249765954911709, ref_abs_avg=21.85501480102539, test_abs_avg=21.849334716796875
production_forward2 grad[80] vs paper_forward: mean_abs=0.3872408866882324, max_abs=2.0, mean_rel=0.13463851809501648, max_rel=28.36454200744629, norm_rel=0.022891340777277946, ref_abs_avg=16.17538833618164, test_abs_avg=16.167827606201172
production_forward2 grad[81] vs paper_forward: mean_abs=0.4931587874889374, max_abs=4.5, mean_rel=0.1475352942943573, max_rel=581.4579467773438, norm_rel=0.023353004828095436, ref_abs_avg=21.172136306762695, test_abs_avg=21.17261505126953
production_forward2 grad[82] vs paper_forward: mean_abs=0.45792752504348755, max_abs=3.5, mean_rel=0.22558221220970154, max_rel=1499.9998779296875, norm_rel=0.02203787863254547, ref_abs_avg=20.885786056518555, test_abs_avg=20.883853912353516
production_forward2 grad[83] vs paper_forward: mean_abs=0.3398439884185791, max_abs=1.5, mean_rel=0.13917392492294312, max_rel=28.169952392578125, norm_rel=0.0208427757024765, ref_abs_avg=16.810569763183594, test_abs_avg=16.81578826904297
production_forward2 grad[84] vs paper_forward: mean_abs=0.46156466007232666, max_abs=4.0, mean_rel=0.13771679997444153, max_rel=525.9137573242188, norm_rel=0.022861763834953308, ref_abs_avg=20.239524841308594, test_abs_avg=20.240684509277344
production_forward2 grad[85] vs paper_forward: mean_abs=0.4260503053665161, max_abs=3.8125, mean_rel=0.19024038314819336, max_rel=968.7499389648438, norm_rel=0.021475432440638542, ref_abs_avg=19.9625244140625, test_abs_avg=19.963077545166016
production_forward2 grad[86] vs paper_forward: mean_abs=0.3482193946838379, max_abs=1.291015625, mean_rel=0.08990196138620377, max_rel=8.380989074707031, norm_rel=0.02088778093457222, ref_abs_avg=16.544313430786133, test_abs_avg=16.581247329711914
production_forward2 grad[87] vs paper_forward: mean_abs=0.4352530539035797, max_abs=4.0, mean_rel=0.13976439833641052, max_rel=1001.94873046875, norm_rel=0.022390665486454964, ref_abs_avg=19.559785842895508, test_abs_avg=19.55953025817871
production_forward2 grad[88] vs paper_forward: mean_abs=0.39864903688430786, max_abs=5.125, mean_rel=0.18309557437896729, max_rel=1250.0, norm_rel=0.020909028127789497, ref_abs_avg=19.25068473815918, test_abs_avg=19.241853713989258
production_forward2 grad[89] vs paper_forward: mean_abs=0.3259704113006592, max_abs=1.625, mean_rel=0.134538471698761, max_rel=13.29662799835205, norm_rel=0.021096033975481987, ref_abs_avg=15.6720552444458, test_abs_avg=15.652399063110352
production_forward2 grad[90] vs paper_forward: mean_abs=0.4163878560066223, max_abs=5.0, mean_rel=0.1332177370786667, max_rel=616.0791625976562, norm_rel=0.02200506068766117, ref_abs_avg=19.07494354248047, test_abs_avg=19.074277877807617
production_forward2 grad[91] vs paper_forward: mean_abs=0.37830519676208496, max_abs=3.1875, mean_rel=0.17975448071956635, max_rel=999.9999389648438, norm_rel=0.020056629553437233, ref_abs_avg=19.044330596923828, test_abs_avg=19.03921127319336
production_forward2 grad[92] vs paper_forward: mean_abs=0.30398130416870117, max_abs=1.125, mean_rel=0.0765550509095192, max_rel=9.827763557434082, norm_rel=0.019346589222550392, ref_abs_avg=16.245655059814453, test_abs_avg=16.25115966796875
production_forward2 grad[93] vs paper_forward: mean_abs=0.3917021155357361, max_abs=4.5, mean_rel=0.13114161789417267, max_rel=660.47119140625, norm_rel=0.021393967792391777, ref_abs_avg=18.552532196044922, test_abs_avg=18.554391860961914
production_forward2 grad[94] vs paper_forward: mean_abs=0.34817221760749817, max_abs=3.25, mean_rel=0.15983422100543976, max_rel=1156.25, norm_rel=0.019169382750988007, ref_abs_avg=18.391395568847656, test_abs_avg=18.396717071533203
production_forward2 grad[95] vs paper_forward: mean_abs=0.2693866491317749, max_abs=1.125, mean_rel=0.08351200073957443, max_rel=4.211202621459961, norm_rel=0.017922917380928993, ref_abs_avg=14.930041313171387, test_abs_avg=14.929931640625
production_forward2 grad[96] vs paper_forward: mean_abs=0.3729589879512787, max_abs=4.75, mean_rel=0.12668712437152863, max_rel=635.4656372070312, norm_rel=0.020969053730368614, ref_abs_avg=18.070411682128906, test_abs_avg=18.06991195678711
production_forward2 grad[97] vs paper_forward: mean_abs=0.329145610332489, max_abs=3.0, mean_rel=0.16953717172145844, max_rel=2062.5, norm_rel=0.01921805553138256, ref_abs_avg=17.353620529174805, test_abs_avg=17.33956527709961
identity layers + randn queries
paper_forward fwd+bwd:  382.543 ms
paper_forward bwd-only: 302.566 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.742 GiB, fwd+bwd=32.492 GiB
production_forward2 fwd+bwd:  114.322 ms
production_forward2 bwd-only: 96.049 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.324 GiB, fwd+bwd=10.324 GiB
torch_compile_phases_forward fwd+bwd:  167.068 ms
torch_compile_phases_forward bwd-only: 132.946 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB
production_forward fwd+bwd:  113.579 ms
production_forward bwd-only: 96.142 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.324 GiB, fwd+bwd=10.324 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016011937987059355, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.008027752861380577, max_abs=0.328125, mean_rel=0.07065524905920029, max_rel=92.44483184814453, norm_rel=0.01922805793583393, ref_abs_avg=0.4505610764026642, test_abs_avg=0.45057839155197144
production_forward grad[1] vs paper_forward: mean_abs=6.897030830383301, max_abs=56.0, mean_rel=0.15036311745643616, max_rel=275.4474182128906, norm_rel=0.019576242193579674, ref_abs_avg=308.77978515625, test_abs_avg=308.8616943359375
production_forward grad[2] vs paper_forward: mean_abs=1.219684362411499, max_abs=5.0, mean_rel=0.1781032383441925, max_rel=56.567623138427734, norm_rel=0.020853275433182716, ref_abs_avg=58.351951599121094, test_abs_avg=58.31175231933594
production_forward grad[3] vs paper_forward: mean_abs=1.5615500211715698, max_abs=12.0, mean_rel=0.17659631371498108, max_rel=1717.2392578125, norm_rel=0.02395966462790966, ref_abs_avg=65.63430786132812, test_abs_avg=65.63926696777344
production_forward grad[4] vs paper_forward: mean_abs=1.4287391901016235, max_abs=10.0, mean_rel=0.3743496537208557, max_rel=4500.0, norm_rel=0.022260576486587524, ref_abs_avg=64.4645004272461, test_abs_avg=64.48133850097656
production_forward grad[5] vs paper_forward: mean_abs=1.0597639083862305, max_abs=5.0, mean_rel=0.09432978928089142, max_rel=3.4627389907836914, norm_rel=0.02351699396967888, ref_abs_avg=45.583980560302734, test_abs_avg=45.533546447753906
production_forward grad[6] vs paper_forward: mean_abs=1.3508360385894775, max_abs=10.0, mean_rel=0.1720639020204544, max_rel=3440.2880859375, norm_rel=0.02360694296658039, ref_abs_avg=57.700401306152344, test_abs_avg=57.69841384887695
production_forward grad[7] vs paper_forward: mean_abs=1.2474838495254517, max_abs=7.5, mean_rel=0.33485063910484314, max_rel=3281.249755859375, norm_rel=0.02200961671769619, ref_abs_avg=57.033931732177734, test_abs_avg=57.03258514404297
production_forward grad[8] vs paper_forward: mean_abs=0.9692695140838623, max_abs=3.75, mean_rel=0.09483826160430908, max_rel=7.831490516662598, norm_rel=0.021914415061473846, ref_abs_avg=43.84656524658203, test_abs_avg=43.866172790527344
production_forward grad[9] vs paper_forward: mean_abs=1.225643277168274, max_abs=10.0, mean_rel=0.1574225127696991, max_rel=3196.371337890625, norm_rel=0.023485135287046432, ref_abs_avg=52.56227111816406, test_abs_avg=52.56334686279297
production_forward grad[10] vs paper_forward: mean_abs=1.1250431537628174, max_abs=7.5, mean_rel=0.3072856366634369, max_rel=4187.5, norm_rel=0.021745026111602783, ref_abs_avg=52.0232048034668, test_abs_avg=52.034629821777344
production_forward grad[11] vs paper_forward: mean_abs=0.8551135063171387, max_abs=3.0, mean_rel=0.3428809940814972, max_rel=109.85114288330078, norm_rel=0.020378712564706802, ref_abs_avg=42.228885650634766, test_abs_avg=42.32545471191406
production_forward grad[12] vs paper_forward: mean_abs=1.1211903095245361, max_abs=8.0, mean_rel=0.15579676628112793, max_rel=1334.88720703125, norm_rel=0.023206336423754692, ref_abs_avg=48.6397590637207, test_abs_avg=48.637603759765625
production_forward grad[13] vs paper_forward: mean_abs=1.0291633605957031, max_abs=6.5, mean_rel=0.30172449350357056, max_rel=2874.999755859375, norm_rel=0.021505678072571754, ref_abs_avg=48.178104400634766, test_abs_avg=48.179359436035156
production_forward grad[14] vs paper_forward: mean_abs=0.8025331497192383, max_abs=3.75, mean_rel=0.09285569936037064, max_rel=14.48813247680664, norm_rel=0.020841768011450768, ref_abs_avg=38.63197708129883, test_abs_avg=38.68855285644531
production_forward grad[15] vs paper_forward: mean_abs=1.0532231330871582, max_abs=8.0, mean_rel=0.1463080644607544, max_rel=1648.6124267578125, norm_rel=0.023102903738617897, ref_abs_avg=45.857749938964844, test_abs_avg=45.85781478881836
production_forward grad[16] vs paper_forward: mean_abs=0.9621654152870178, max_abs=5.875, mean_rel=0.3366871476173401, max_rel=3249.999755859375, norm_rel=0.021417945623397827, ref_abs_avg=45.20634841918945, test_abs_avg=45.209228515625
production_forward grad[17] vs paper_forward: mean_abs=0.6788515448570251, max_abs=3.8125, mean_rel=0.22556465864181519, max_rel=57.430110931396484, norm_rel=0.02042185515165329, ref_abs_avg=34.867000579833984, test_abs_avg=34.85896682739258
production_forward grad[18] vs paper_forward: mean_abs=0.9877474904060364, max_abs=7.0, mean_rel=0.16379812359809875, max_rel=1659.0823974609375, norm_rel=0.023015161976218224, ref_abs_avg=43.20121765136719, test_abs_avg=43.201210021972656
production_forward grad[19] vs paper_forward: mean_abs=0.9065544009208679, max_abs=6.0, mean_rel=0.28487420082092285, max_rel=2781.249755859375, norm_rel=0.02124442718923092, ref_abs_avg=42.90481185913086, test_abs_avg=42.909584045410156
production_forward grad[20] vs paper_forward: mean_abs=0.7333865165710449, max_abs=3.25, mean_rel=0.11205551773309708, max_rel=14.054666519165039, norm_rel=0.021786412224173546, ref_abs_avg=33.54901123046875, test_abs_avg=33.60080337524414
production_forward grad[21] vs paper_forward: mean_abs=0.9370568990707397, max_abs=7.0, mean_rel=0.16241559386253357, max_rel=2275.44970703125, norm_rel=0.022886982187628746, ref_abs_avg=41.175411224365234, test_abs_avg=41.17963790893555
production_forward grad[22] vs paper_forward: mean_abs=0.8587852716445923, max_abs=5.125, mean_rel=0.27522873878479004, max_rel=2874.999755859375, norm_rel=0.021214943379163742, ref_abs_avg=40.67622375488281, test_abs_avg=40.67768478393555
production_forward grad[23] vs paper_forward: mean_abs=0.7150897979736328, max_abs=2.5625, mean_rel=0.09048091620206833, max_rel=12.751168251037598, norm_rel=0.021819470450282097, ref_abs_avg=32.734275817871094, test_abs_avg=32.75261306762695
production_forward grad[24] vs paper_forward: mean_abs=0.8907895684242249, max_abs=6.5, mean_rel=0.14371268451213837, max_rel=926.8893432617188, norm_rel=0.02277941070497036, ref_abs_avg=39.36503601074219, test_abs_avg=39.36601638793945
production_forward grad[25] vs paper_forward: mean_abs=0.816757082939148, max_abs=5.25, mean_rel=0.30725300312042236, max_rel=2999.999755859375, norm_rel=0.02117200382053852, ref_abs_avg=38.750267028808594, test_abs_avg=38.751708984375
production_forward grad[26] vs paper_forward: mean_abs=0.7741971015930176, max_abs=2.6875, mean_rel=0.08808326721191406, max_rel=4.595846176147461, norm_rel=0.022233031690120697, ref_abs_avg=35.62565231323242, test_abs_avg=35.61979675292969
production_forward grad[27] vs paper_forward: mean_abs=1.0194873809814453, max_abs=7.75, mean_rel=0.15869824588298798, max_rel=1499.529541015625, norm_rel=0.024378618225455284, ref_abs_avg=42.088218688964844, test_abs_avg=42.092620849609375
production_forward grad[28] vs paper_forward: mean_abs=0.9368040561676025, max_abs=6.875, mean_rel=0.29485762119293213, max_rel=2671.874755859375, norm_rel=0.02300352044403553, ref_abs_avg=40.95391845703125, test_abs_avg=40.965003967285156
production_forward grad[29] vs paper_forward: mean_abs=0.7741857767105103, max_abs=2.75, mean_rel=0.13032172620296478, max_rel=14.2080659866333, norm_rel=0.024685760959982872, ref_abs_avg=30.57917022705078, test_abs_avg=30.6331787109375
production_forward grad[30] vs paper_forward: mean_abs=0.9537384510040283, max_abs=7.0, mean_rel=0.15873983502388, max_rel=1185.3544921875, norm_rel=0.024836014956235886, ref_abs_avg=38.59954071044922, test_abs_avg=38.60040283203125
production_forward grad[31] vs paper_forward: mean_abs=0.8888170719146729, max_abs=5.75, mean_rel=0.26851290464401245, max_rel=2593.749755859375, norm_rel=0.02334977686405182, ref_abs_avg=38.17947769165039, test_abs_avg=38.18672561645508
production_forward grad[32] vs paper_forward: mean_abs=0.7006235122680664, max_abs=2.5, mean_rel=0.18541011214256287, max_rel=26.97990608215332, norm_rel=0.024587105959653854, ref_abs_avg=28.079326629638672, test_abs_avg=28.086814880371094
production_forward grad[33] vs paper_forward: mean_abs=0.885122537612915, max_abs=6.0, mean_rel=0.17063254117965698, max_rel=1204.791015625, norm_rel=0.024732215330004692, ref_abs_avg=35.94837188720703, test_abs_avg=35.95094680786133
production_forward grad[34] vs paper_forward: mean_abs=0.8289951086044312, max_abs=6.0, mean_rel=0.30302828550338745, max_rel=2328.125, norm_rel=0.02351626381278038, ref_abs_avg=35.415252685546875, test_abs_avg=35.42195129394531
production_forward grad[35] vs paper_forward: mean_abs=0.7051646709442139, max_abs=2.75, mean_rel=0.2918570041656494, max_rel=80.95826721191406, norm_rel=0.022669808939099312, ref_abs_avg=30.346580505371094, test_abs_avg=30.271120071411133
production_forward grad[36] vs paper_forward: mean_abs=0.8374652862548828, max_abs=6.0, mean_rel=0.1596986949443817, max_rel=1412.5885009765625, norm_rel=0.024466220289468765, ref_abs_avg=34.35063552856445, test_abs_avg=34.35258102416992
production_forward grad[37] vs paper_forward: mean_abs=0.772976279258728, max_abs=5.25, mean_rel=0.27956652641296387, max_rel=2593.749755859375, norm_rel=0.023127445951104164, ref_abs_avg=33.58771514892578, test_abs_avg=33.58769226074219
production_forward grad[38] vs paper_forward: mean_abs=0.6449499130249023, max_abs=2.5625, mean_rel=0.10288970917463303, max_rel=7.255936622619629, norm_rel=0.025730112567543983, ref_abs_avg=25.63794708251953, test_abs_avg=25.63312530517578
production_forward grad[39] vs paper_forward: mean_abs=0.7871012687683105, max_abs=5.0, mean_rel=0.1660660356283188, max_rel=1256.793212890625, norm_rel=0.02443799003958702, ref_abs_avg=32.34889602661133, test_abs_avg=32.351478576660156
production_forward grad[40] vs paper_forward: mean_abs=0.7322810888290405, max_abs=4.25, mean_rel=0.22980228066444397, max_rel=2749.999755859375, norm_rel=0.022900255396962166, ref_abs_avg=32.06559753417969, test_abs_avg=32.065486907958984
production_forward grad[41] vs paper_forward: mean_abs=0.5822672843933105, max_abs=2.25, mean_rel=0.16487060487270355, max_rel=23.252992630004883, norm_rel=0.023900197818875313, ref_abs_avg=24.418994903564453, test_abs_avg=24.4083194732666
production_forward grad[42] vs paper_forward: mean_abs=0.7532444000244141, max_abs=6.0, mean_rel=0.1554097980260849, max_rel=1488.1357421875, norm_rel=0.024128630757331848, ref_abs_avg=31.311620712280273, test_abs_avg=31.315258026123047
production_forward grad[43] vs paper_forward: mean_abs=0.6904522776603699, max_abs=4.03125, mean_rel=0.25621768832206726, max_rel=2812.499755859375, norm_rel=0.022576475515961647, ref_abs_avg=30.700672149658203, test_abs_avg=30.702285766601562
production_forward grad[44] vs paper_forward: mean_abs=0.548283576965332, max_abs=2.125, mean_rel=0.08933726698160172, max_rel=8.027938842773438, norm_rel=0.023251555860042572, ref_abs_avg=24.042049407958984, test_abs_avg=24.059539794921875
production_forward grad[45] vs paper_forward: mean_abs=0.7101624011993408, max_abs=5.0, mean_rel=0.15394799411296844, max_rel=1131.84521484375, norm_rel=0.023772872984409332, ref_abs_avg=30.009084701538086, test_abs_avg=30.013689041137695
production_forward grad[46] vs paper_forward: mean_abs=0.658964216709137, max_abs=4.0625, mean_rel=0.27546313405036926, max_rel=1749.9998779296875, norm_rel=0.022532163187861443, ref_abs_avg=29.37299919128418, test_abs_avg=29.374225616455078
production_forward grad[47] vs paper_forward: mean_abs=0.5545005798339844, max_abs=2.5, mean_rel=0.08592221140861511, max_rel=8.105752944946289, norm_rel=0.022649191319942474, ref_abs_avg=24.87755584716797, test_abs_avg=24.848522186279297
production_forward grad[48] vs paper_forward: mean_abs=0.6816216707229614, max_abs=5.0, mean_rel=0.16798317432403564, max_rel=1557.84521484375, norm_rel=0.02372608333826065, ref_abs_avg=28.84390640258789, test_abs_avg=28.843923568725586
production_forward grad[49] vs paper_forward: mean_abs=0.6320343017578125, max_abs=3.875, mean_rel=0.2635345160961151, max_rel=1906.2498779296875, norm_rel=0.02239011786878109, ref_abs_avg=28.348674774169922, test_abs_avg=28.35132598876953
production_forward grad[50] vs paper_forward: mean_abs=0.5630531311035156, max_abs=2.375, mean_rel=0.11008515954017639, max_rel=24.574819564819336, norm_rel=0.02319456823170185, ref_abs_avg=23.901294708251953, test_abs_avg=23.934539794921875
production_forward grad[51] vs paper_forward: mean_abs=0.7375171184539795, max_abs=5.0, mean_rel=0.1624119132757187, max_rel=1110.8856201171875, norm_rel=0.024888863787055016, ref_abs_avg=29.73512840270996, test_abs_avg=29.73550796508789
production_forward grad[52] vs paper_forward: mean_abs=0.6911967396736145, max_abs=4.75, mean_rel=0.28097742795944214, max_rel=2187.5, norm_rel=0.023592974990606308, ref_abs_avg=29.429466247558594, test_abs_avg=29.432714462280273
production_forward grad[53] vs paper_forward: mean_abs=0.5307579040527344, max_abs=2.4375, mean_rel=0.08656171709299088, max_rel=11.569784164428711, norm_rel=0.022958947345614433, ref_abs_avg=22.819429397583008, test_abs_avg=22.82869529724121
production_forward grad[54] vs paper_forward: mean_abs=0.6893993616104126, max_abs=5.0, mean_rel=0.15304023027420044, max_rel=1346.4300537109375, norm_rel=0.024559490382671356, ref_abs_avg=28.162166595458984, test_abs_avg=28.16333770751953
production_forward grad[55] vs paper_forward: mean_abs=0.638728678226471, max_abs=4.0, mean_rel=0.2437981218099594, max_rel=1999.9998779296875, norm_rel=0.02318512089550495, ref_abs_avg=27.574256896972656, test_abs_avg=27.575281143188477
production_forward grad[56] vs paper_forward: mean_abs=0.4996849298477173, max_abs=2.0, mean_rel=0.10273543000221252, max_rel=13.75314712524414, norm_rel=0.021747730672359467, ref_abs_avg=23.41936492919922, test_abs_avg=23.458402633666992
production_forward grad[57] vs paper_forward: mean_abs=0.6494271755218506, max_abs=5.0, mean_rel=0.16208970546722412, max_rel=1655.78125, norm_rel=0.024280698969960213, ref_abs_avg=26.802780151367188, test_abs_avg=26.803119659423828
production_forward grad[58] vs paper_forward: mean_abs=0.6010925769805908, max_abs=4.0, mean_rel=0.2707192599773407, max_rel=2375.0, norm_rel=0.02278299070894718, ref_abs_avg=26.47216033935547, test_abs_avg=26.47275161743164
production_forward grad[59] vs paper_forward: mean_abs=0.43432533740997314, max_abs=2.0, mean_rel=0.2862415611743927, max_rel=39.7345085144043, norm_rel=0.020142409950494766, ref_abs_avg=22.43923568725586, test_abs_avg=22.45928192138672
production_forward grad[60] vs paper_forward: mean_abs=0.6115803718566895, max_abs=4.25, mean_rel=0.15850207209587097, max_rel=1895.813232421875, norm_rel=0.023838412016630173, ref_abs_avg=25.728946685791016, test_abs_avg=25.72800064086914
production_forward grad[61] vs paper_forward: mean_abs=0.5609289407730103, max_abs=3.75, mean_rel=0.21984171867370605, max_rel=1937.4998779296875, norm_rel=0.022564908489584923, ref_abs_avg=24.959976196289062, test_abs_avg=24.9627685546875
production_forward grad[62] vs paper_forward: mean_abs=0.4411970376968384, max_abs=2.25, mean_rel=0.08531218767166138, max_rel=7.178616523742676, norm_rel=0.022241223603487015, ref_abs_avg=19.919864654541016, test_abs_avg=19.960468292236328
production_forward grad[63] vs paper_forward: mean_abs=0.573878824710846, max_abs=6.5, mean_rel=0.15186192095279694, max_rel=1080.2379150390625, norm_rel=0.023593321442604065, ref_abs_avg=24.368064880371094, test_abs_avg=24.367963790893555
production_forward grad[64] vs paper_forward: mean_abs=0.5276033878326416, max_abs=3.125, mean_rel=0.21650412678718567, max_rel=1593.7498779296875, norm_rel=0.021649545058608055, ref_abs_avg=24.378189086914062, test_abs_avg=24.38542938232422
production_forward grad[65] vs paper_forward: mean_abs=0.41967201232910156, max_abs=2.0, mean_rel=0.12269173562526703, max_rel=16.753162384033203, norm_rel=0.02349923364818096, ref_abs_avg=18.341964721679688, test_abs_avg=18.33340072631836
production_forward grad[66] vs paper_forward: mean_abs=0.5405771732330322, max_abs=4.75, mean_rel=0.15564453601837158, max_rel=1408.6077880859375, norm_rel=0.023337963968515396, ref_abs_avg=23.233043670654297, test_abs_avg=23.233047485351562
production_forward grad[67] vs paper_forward: mean_abs=0.5065510869026184, max_abs=3.921875, mean_rel=0.2637206017971039, max_rel=1749.9998779296875, norm_rel=0.022053973749279976, ref_abs_avg=22.900474548339844, test_abs_avg=22.902557373046875
production_forward grad[68] vs paper_forward: mean_abs=0.40482842922210693, max_abs=1.6875, mean_rel=0.1444169580936432, max_rel=23.013566970825195, norm_rel=0.022537171840667725, ref_abs_avg=18.082612991333008, test_abs_avg=18.079227447509766
production_forward grad[69] vs paper_forward: mean_abs=0.5174572467803955, max_abs=4.25, mean_rel=0.1420011818408966, max_rel=1148.1727294921875, norm_rel=0.022886911407113075, ref_abs_avg=22.665037155151367, test_abs_avg=22.66539764404297
production_forward grad[70] vs paper_forward: mean_abs=0.47982972860336304, max_abs=3.5, mean_rel=0.21886536478996277, max_rel=1562.4998779296875, norm_rel=0.021570345386862755, ref_abs_avg=22.28341293334961, test_abs_avg=22.294662475585938
production_forward grad[71] vs paper_forward: mean_abs=0.3724026679992676, max_abs=1.375, mean_rel=0.12234759330749512, max_rel=8.155623435974121, norm_rel=0.019735250622034073, ref_abs_avg=18.924327850341797, test_abs_avg=18.945831298828125
production_forward grad[72] vs paper_forward: mean_abs=0.49009862542152405, max_abs=4.25, mean_rel=0.1422308385372162, max_rel=1028.326171875, norm_rel=0.02260568179190159, ref_abs_avg=21.734825134277344, test_abs_avg=21.733768463134766
production_forward grad[73] vs paper_forward: mean_abs=0.45531773567199707, max_abs=4.0, mean_rel=0.2086537778377533, max_rel=1562.4998779296875, norm_rel=0.02054884284734726, ref_abs_avg=22.120101928710938, test_abs_avg=22.1262149810791
production_forward grad[74] vs paper_forward: mean_abs=0.41832515597343445, max_abs=1.75, mean_rel=0.0789732038974762, max_rel=4.248711109161377, norm_rel=0.022057360038161278, ref_abs_avg=19.145671844482422, test_abs_avg=19.12883186340332
production_forward grad[75] vs paper_forward: mean_abs=0.5409425497055054, max_abs=5.0, mean_rel=0.16154952347278595, max_rel=1141.4219970703125, norm_rel=0.024305032566189766, ref_abs_avg=22.310497283935547, test_abs_avg=22.308917999267578
production_forward grad[76] vs paper_forward: mean_abs=0.4999517500400543, max_abs=4.0, mean_rel=0.19122952222824097, max_rel=1250.0, norm_rel=0.022992169484496117, ref_abs_avg=21.77385711669922, test_abs_avg=21.770687103271484
production_forward grad[77] vs paper_forward: mean_abs=0.3877830505371094, max_abs=1.375, mean_rel=0.10702952742576599, max_rel=6.126616477966309, norm_rel=0.021775634959340096, ref_abs_avg=17.088821411132812, test_abs_avg=17.07878875732422
production_forward grad[78] vs paper_forward: mean_abs=0.500255823135376, max_abs=4.0, mean_rel=0.14207956194877625, max_rel=460.3016357421875, norm_rel=0.02346593700349331, ref_abs_avg=21.379249572753906, test_abs_avg=21.380115509033203
production_forward grad[79] vs paper_forward: mean_abs=0.4598374366760254, max_abs=3.25, mean_rel=0.19014865159988403, max_rel=1187.5, norm_rel=0.0218390841037035, ref_abs_avg=21.073320388793945, test_abs_avg=21.070798873901367
production_forward grad[80] vs paper_forward: mean_abs=0.3930366039276123, max_abs=1.9375, mean_rel=0.11236388981342316, max_rel=12.476417541503906, norm_rel=0.024607926607131958, ref_abs_avg=16.558956146240234, test_abs_avg=16.538818359375
production_forward grad[81] vs paper_forward: mean_abs=0.47003820538520813, max_abs=4.5, mean_rel=0.14758709073066711, max_rel=707.1844482421875, norm_rel=0.023226376622915268, ref_abs_avg=20.323837280273438, test_abs_avg=20.32267189025879
production_forward grad[82] vs paper_forward: mean_abs=0.4324099123477936, max_abs=3.5, mean_rel=0.2219514548778534, max_rel=1531.2498779296875, norm_rel=0.02172963321208954, ref_abs_avg=19.945226669311523, test_abs_avg=19.949199676513672
production_forward grad[83] vs paper_forward: mean_abs=0.34797191619873047, max_abs=1.375, mean_rel=0.06864595413208008, max_rel=3.26550555229187, norm_rel=0.02129395306110382, ref_abs_avg=16.363388061523438, test_abs_avg=16.397377014160156
production_forward grad[84] vs paper_forward: mean_abs=0.4342309236526489, max_abs=4.0, mean_rel=0.1499461829662323, max_rel=975.6656494140625, norm_rel=0.02278316393494606, ref_abs_avg=19.187698364257812, test_abs_avg=19.189422607421875
production_forward grad[85] vs paper_forward: mean_abs=0.39686456322669983, max_abs=3.25, mean_rel=0.2104130983352661, max_rel=1257.8125, norm_rel=0.020756615325808525, ref_abs_avg=19.24138641357422, test_abs_avg=19.23569107055664
production_forward grad[86] vs paper_forward: mean_abs=0.32847124338150024, max_abs=1.5, mean_rel=0.12276799231767654, max_rel=27.027538299560547, norm_rel=0.020333532243967056, ref_abs_avg=16.349231719970703, test_abs_avg=16.29897117614746
production_forward grad[87] vs paper_forward: mean_abs=0.41209012269973755, max_abs=4.0, mean_rel=0.1323392540216446, max_rel=525.663330078125, norm_rel=0.021958615630865097, ref_abs_avg=18.884910583496094, test_abs_avg=18.887359619140625
production_forward grad[88] vs paper_forward: mean_abs=0.3724917769432068, max_abs=3.0, mean_rel=0.17700177431106567, max_rel=968.7499389648438, norm_rel=0.020195908844470978, ref_abs_avg=18.56018829345703, test_abs_avg=18.56058692932129
production_forward grad[89] vs paper_forward: mean_abs=0.32230520248413086, max_abs=1.34375, mean_rel=0.0923488438129425, max_rel=8.939716339111328, norm_rel=0.02114424854516983, ref_abs_avg=15.295137405395508, test_abs_avg=15.320459365844727
production_forward grad[90] vs paper_forward: mean_abs=0.39299625158309937, max_abs=5.5, mean_rel=0.13237780332565308, max_rel=664.3700561523438, norm_rel=0.021411921828985214, ref_abs_avg=18.534738540649414, test_abs_avg=18.536903381347656
production_forward grad[91] vs paper_forward: mean_abs=0.3512110114097595, max_abs=3.25, mean_rel=0.16569828987121582, max_rel=1156.25, norm_rel=0.01924881711602211, ref_abs_avg=18.308181762695312, test_abs_avg=18.31078338623047
production_forward grad[92] vs paper_forward: mean_abs=0.31361961364746094, max_abs=1.125, mean_rel=0.06395156681537628, max_rel=2.937474012374878, norm_rel=0.02180304378271103, ref_abs_avg=14.411117553710938, test_abs_avg=14.402097702026367
production_forward grad[93] vs paper_forward: mean_abs=0.3710510730743408, max_abs=4.0, mean_rel=0.1278296858072281, max_rel=872.2398681640625, norm_rel=0.021091531962156296, ref_abs_avg=17.866851806640625, test_abs_avg=17.86756134033203
production_forward grad[94] vs paper_forward: mean_abs=0.33543866872787476, max_abs=3.25, mean_rel=0.17858579754829407, max_rel=1312.4998779296875, norm_rel=0.02008126489818096, ref_abs_avg=17.05765724182129, test_abs_avg=17.052417755126953
production_forward grad[95] vs paper_forward: mean_abs=0.2640763521194458, max_abs=1.0, mean_rel=0.19806545972824097, max_rel=34.869083404541016, norm_rel=0.019601596519351006, ref_abs_avg=14.271836280822754, test_abs_avg=14.257966995239258
production_forward grad[96] vs paper_forward: mean_abs=0.3481304943561554, max_abs=3.75, mean_rel=0.12228824943304062, max_rel=629.8927612304688, norm_rel=0.02055799402296543, ref_abs_avg=17.244049072265625, test_abs_avg=17.24372673034668
production_forward grad[97] vs paper_forward: mean_abs=0.31061723828315735, max_abs=3.25, mean_rel=0.15357741713523865, max_rel=1062.5, norm_rel=0.01824083738029003, ref_abs_avg=17.365198135375977, test_abs_avg=17.362083435058594
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001602493692189455, max_abs=0.03125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008368080481886864, max_abs=0.328125, mean_rel=0.07330109179019928, max_rel=96.41415405273438, norm_rel=0.01993207260966301, ref_abs_avg=0.4505610764026642, test_abs_avg=0.45056506991386414
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.082350254058838, max_abs=64.0, mean_rel=0.16038934886455536, max_rel=205.89486694335938, norm_rel=0.020123112946748734, ref_abs_avg=308.77978515625, test_abs_avg=308.84967041015625
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2160851955413818, max_abs=5.5, mean_rel=0.16895803809165955, max_rel=53.69947052001953, norm_rel=0.020946374163031578, ref_abs_avg=58.351951599121094, test_abs_avg=58.2703857421875
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6087071895599365, max_abs=12.0, mean_rel=0.1719466894865036, max_rel=2175.7158203125, norm_rel=0.024687133729457855, ref_abs_avg=65.63430786132812, test_abs_avg=65.63629150390625
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.4844802618026733, max_abs=10.0, mean_rel=0.38586390018463135, max_rel=4500.0, norm_rel=0.02312660776078701, ref_abs_avg=64.4645004272461, test_abs_avg=64.4669189453125
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0589771270751953, max_abs=4.875, mean_rel=0.0849463939666748, max_rel=3.3165459632873535, norm_rel=0.023907141759991646, ref_abs_avg=45.583980560302734, test_abs_avg=45.62957763671875
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3951332569122314, max_abs=9.0, mean_rel=0.1738075315952301, max_rel=4154.24609375, norm_rel=0.024355677887797356, ref_abs_avg=57.700401306152344, test_abs_avg=57.69872283935547
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.2902238368988037, max_abs=8.25, mean_rel=0.35447606444358826, max_rel=5000.0, norm_rel=0.02274571731686592, ref_abs_avg=57.033931732177734, test_abs_avg=57.03138732910156
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9464530944824219, max_abs=4.0, mean_rel=0.08698670566082001, max_rel=4.773386478424072, norm_rel=0.021617215126752853, ref_abs_avg=43.84656524658203, test_abs_avg=43.83836364746094
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2605233192443848, max_abs=12.0, mean_rel=0.16422586143016815, max_rel=2758.521240234375, norm_rel=0.02413591556251049, ref_abs_avg=52.56227111816406, test_abs_avg=52.562049865722656
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1680927276611328, max_abs=7.0, mean_rel=0.3476297855377197, max_rel=5749.99951171875, norm_rel=0.02254854328930378, ref_abs_avg=52.0232048034668, test_abs_avg=52.03642272949219
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.8657264709472656, max_abs=3.625, mean_rel=0.3313571810722351, max_rel=97.20514678955078, norm_rel=0.020665708929300308, ref_abs_avg=42.228885650634766, test_abs_avg=42.289947509765625
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1539586782455444, max_abs=9.0, mean_rel=0.16050413250923157, max_rel=1301.9625244140625, norm_rel=0.02387012355029583, ref_abs_avg=48.6397590637207, test_abs_avg=48.638267517089844
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0627474784851074, max_abs=6.625, mean_rel=0.2749003767967224, max_rel=3249.999755859375, norm_rel=0.022190066054463387, ref_abs_avg=48.178104400634766, test_abs_avg=48.178707122802734
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.834660530090332, max_abs=4.5, mean_rel=0.09612243622541428, max_rel=12.067138671875, norm_rel=0.02150464616715908, ref_abs_avg=38.63197708129883, test_abs_avg=38.66216278076172
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.0794837474822998, max_abs=7.0, mean_rel=0.15171536803245544, max_rel=2172.24951171875, norm_rel=0.023684494197368622, ref_abs_avg=45.857749938964844, test_abs_avg=45.857398986816406
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.9888743758201599, max_abs=6.0, mean_rel=0.37730276584625244, max_rel=2781.249755859375, norm_rel=0.022014785557985306, ref_abs_avg=45.20634841918945, test_abs_avg=45.20803451538086
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7535145282745361, max_abs=3.1875, mean_rel=0.26445508003234863, max_rel=67.49837493896484, norm_rel=0.021583545953035355, ref_abs_avg=34.867000579833984, test_abs_avg=34.89109802246094
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0127214193344116, max_abs=8.0, mean_rel=0.16605767607688904, max_rel=1443.1142578125, norm_rel=0.02358873002231121, ref_abs_avg=43.20121765136719, test_abs_avg=43.199256896972656
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9306234121322632, max_abs=6.0, mean_rel=0.3088493347167969, max_rel=2437.5, norm_rel=0.0218113474547863, ref_abs_avg=42.90481185913086, test_abs_avg=42.905433654785156
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7539587020874023, max_abs=2.75, mean_rel=0.12102088332176208, max_rel=22.708282470703125, norm_rel=0.022748196497559547, ref_abs_avg=33.54901123046875, test_abs_avg=33.57988739013672
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9586185812950134, max_abs=8.0, mean_rel=0.15962883830070496, max_rel=2187.940185546875, norm_rel=0.023416247218847275, ref_abs_avg=41.175411224365234, test_abs_avg=41.1794319152832
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8805874586105347, max_abs=5.75, mean_rel=0.29481467604637146, max_rel=2874.999755859375, norm_rel=0.021749695762991905, ref_abs_avg=40.67622375488281, test_abs_avg=40.67639923095703
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7238597869873047, max_abs=2.5, mean_rel=0.08765438199043274, max_rel=13.425769805908203, norm_rel=0.022043148055672646, ref_abs_avg=32.734275817871094, test_abs_avg=32.761009216308594
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9118307828903198, max_abs=6.25, mean_rel=0.14681623876094818, max_rel=848.1685791015625, norm_rel=0.0232932660728693, ref_abs_avg=39.36503601074219, test_abs_avg=39.36598205566406
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8368279933929443, max_abs=4.875, mean_rel=0.3154027462005615, max_rel=3499.999755859375, norm_rel=0.021703055128455162, ref_abs_avg=38.750267028808594, test_abs_avg=38.75129318237305
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.7868337631225586, max_abs=2.75, mean_rel=0.09310866892337799, max_rel=7.285527229309082, norm_rel=0.022418571636080742, ref_abs_avg=35.62565231323242, test_abs_avg=35.60543441772461
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.043372392654419, max_abs=7.0, mean_rel=0.16570448875427246, max_rel=1746.5550537109375, norm_rel=0.024942824617028236, ref_abs_avg=42.088218688964844, test_abs_avg=42.090415954589844
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9629212617874146, max_abs=6.875, mean_rel=0.31129559874534607, max_rel=2281.25, norm_rel=0.023637760430574417, ref_abs_avg=40.95391845703125, test_abs_avg=40.964500427246094
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7874311208724976, max_abs=2.875, mean_rel=0.12579882144927979, max_rel=10.41549301147461, norm_rel=0.02501140534877777, ref_abs_avg=30.57917022705078, test_abs_avg=30.66754913330078
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9734834432601929, max_abs=7.0, mean_rel=0.16180330514907837, max_rel=1667.814208984375, norm_rel=0.025327764451503754, ref_abs_avg=38.59954071044922, test_abs_avg=38.596534729003906
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9083635210990906, max_abs=5.75, mean_rel=0.28177258372306824, max_rel=2187.5, norm_rel=0.02386048063635826, ref_abs_avg=38.17947769165039, test_abs_avg=38.18419647216797
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.6803351640701294, max_abs=3.3125, mean_rel=0.21117225289344788, max_rel=50.385494232177734, norm_rel=0.024066396057605743, ref_abs_avg=28.079326629638672, test_abs_avg=28.051292419433594
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9028343558311462, max_abs=5.5, mean_rel=0.1760106384754181, max_rel=2077.796875, norm_rel=0.02522500790655613, ref_abs_avg=35.94837188720703, test_abs_avg=35.948631286621094
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8495370149612427, max_abs=6.0, mean_rel=0.2788063883781433, max_rel=2406.25, norm_rel=0.024104641750454903, ref_abs_avg=35.415252685546875, test_abs_avg=35.41931915283203
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6739723682403564, max_abs=3.0, mean_rel=0.22079509496688843, max_rel=58.42654037475586, norm_rel=0.02255653403699398, ref_abs_avg=30.346580505371094, test_abs_avg=30.291831970214844
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8539137840270996, max_abs=6.0, mean_rel=0.16000378131866455, max_rel=823.8377075195312, norm_rel=0.024945532903075218, ref_abs_avg=34.35063552856445, test_abs_avg=34.352989196777344
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.7896194458007812, max_abs=5.25, mean_rel=0.28953537344932556, max_rel=2906.249755859375, norm_rel=0.02361493930220604, ref_abs_avg=33.58771514892578, test_abs_avg=33.59025573730469
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.655095100402832, max_abs=2.75, mean_rel=0.08975686132907867, max_rel=4.782321929931641, norm_rel=0.026569178327918053, ref_abs_avg=25.63794708251953, test_abs_avg=25.67413330078125
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8020312786102295, max_abs=5.75, mean_rel=0.1648414134979248, max_rel=1435.217041015625, norm_rel=0.02488628588616848, ref_abs_avg=32.34889602661133, test_abs_avg=32.3512077331543
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7474445104598999, max_abs=5.0, mean_rel=0.233527272939682, max_rel=3312.499755859375, norm_rel=0.023370470851659775, ref_abs_avg=32.06559753417969, test_abs_avg=32.062583923339844
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.591787576675415, max_abs=2.625, mean_rel=0.18195918202400208, max_rel=19.434614181518555, norm_rel=0.024140363559126854, ref_abs_avg=24.418994903564453, test_abs_avg=24.410053253173828
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7644270658493042, max_abs=5.5, mean_rel=0.1553032100200653, max_rel=1575.0841064453125, norm_rel=0.024505170062184334, ref_abs_avg=31.311620712280273, test_abs_avg=31.31437110900879
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7014849781990051, max_abs=4.5, mean_rel=0.24784205853939056, max_rel=2390.625, norm_rel=0.022915175184607506, ref_abs_avg=30.700672149658203, test_abs_avg=30.702409744262695
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5625495910644531, max_abs=2.25, mean_rel=0.15213455259799957, max_rel=35.377357482910156, norm_rel=0.02369999885559082, ref_abs_avg=24.042049407958984, test_abs_avg=24.06824493408203
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7195307016372681, max_abs=5.0, mean_rel=0.15797540545463562, max_rel=1284.3553466796875, norm_rel=0.02408837154507637, ref_abs_avg=30.009084701538086, test_abs_avg=30.0131893157959
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6721742749214172, max_abs=4.5, mean_rel=0.27027633786201477, max_rel=1843.7498779296875, norm_rel=0.022963326424360275, ref_abs_avg=29.37299919128418, test_abs_avg=29.376094818115234
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5649701356887817, max_abs=2.01953125, mean_rel=0.08513152599334717, max_rel=9.848311424255371, norm_rel=0.023314910009503365, ref_abs_avg=24.87755584716797, test_abs_avg=24.83341407775879
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6904738545417786, max_abs=5.375, mean_rel=0.1665520966053009, max_rel=1998.2734375, norm_rel=0.024027256295084953, ref_abs_avg=28.84390640258789, test_abs_avg=28.845600128173828
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6409425735473633, max_abs=4.5, mean_rel=0.2665964663028717, max_rel=2062.5, norm_rel=0.022717701271176338, ref_abs_avg=28.348674774169922, test_abs_avg=28.352222442626953
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5985445976257324, max_abs=2.375, mean_rel=0.1278032660484314, max_rel=30.83618927001953, norm_rel=0.024301733821630478, ref_abs_avg=23.901294708251953, test_abs_avg=23.928695678710938
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.749596357345581, max_abs=5.25, mean_rel=0.16371417045593262, max_rel=998.1065063476562, norm_rel=0.025274915620684624, ref_abs_avg=29.73512840270996, test_abs_avg=29.735870361328125
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7008472681045532, max_abs=5.125, mean_rel=0.2788528501987457, max_rel=2312.5, norm_rel=0.02391986921429634, ref_abs_avg=29.429466247558594, test_abs_avg=29.43198013305664
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5176048278808594, max_abs=2.46875, mean_rel=0.06622889637947083, max_rel=3.986638069152832, norm_rel=0.022922638803720474, ref_abs_avg=22.819429397583008, test_abs_avg=22.79649543762207
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.6994308829307556, max_abs=6.0, mean_rel=0.15774501860141754, max_rel=1432.0408935546875, norm_rel=0.02491435781121254, ref_abs_avg=28.162166595458984, test_abs_avg=28.163360595703125
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6497526168823242, max_abs=4.75, mean_rel=0.2532380223274231, max_rel=2375.0, norm_rel=0.023578351363539696, ref_abs_avg=27.574256896972656, test_abs_avg=27.57292938232422
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5093040466308594, max_abs=2.0, mean_rel=0.08728508651256561, max_rel=8.20005989074707, norm_rel=0.022254128009080887, ref_abs_avg=23.41936492919922, test_abs_avg=23.467247009277344
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6584324836730957, max_abs=6.5, mean_rel=0.16209861636161804, max_rel=1792.494873046875, norm_rel=0.024611426517367363, ref_abs_avg=26.802780151367188, test_abs_avg=26.802854537963867
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6085642576217651, max_abs=4.0, mean_rel=0.25664615631103516, max_rel=2187.5, norm_rel=0.02304850146174431, ref_abs_avg=26.47216033935547, test_abs_avg=26.47356414794922
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4508000612258911, max_abs=2.0, mean_rel=0.2335592359304428, max_rel=39.25971221923828, norm_rel=0.020967112854123116, ref_abs_avg=22.43923568725586, test_abs_avg=22.45342445373535
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.620047390460968, max_abs=4.5625, mean_rel=0.16332948207855225, max_rel=1682.507080078125, norm_rel=0.024157609790563583, ref_abs_avg=25.728946685791016, test_abs_avg=25.728065490722656
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.568189263343811, max_abs=3.75, mean_rel=0.22589343786239624, max_rel=2093.75, norm_rel=0.0228301752358675, ref_abs_avg=24.959976196289062, test_abs_avg=24.960708618164062
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4405832290649414, max_abs=2.125, mean_rel=0.07210090756416321, max_rel=4.416503429412842, norm_rel=0.021805843338370323, ref_abs_avg=19.919864654541016, test_abs_avg=19.94345474243164
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.581007719039917, max_abs=5.0, mean_rel=0.15498273074626923, max_rel=1376.8621826171875, norm_rel=0.023866934701800346, ref_abs_avg=24.368064880371094, test_abs_avg=24.367839813232422
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5350848436355591, max_abs=3.5, mean_rel=0.22241774201393127, max_rel=1968.7498779296875, norm_rel=0.021955318748950958, ref_abs_avg=24.378189086914062, test_abs_avg=24.38595962524414
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4331846237182617, max_abs=1.9375, mean_rel=0.12348558753728867, max_rel=13.520219802856445, norm_rel=0.02426842413842678, ref_abs_avg=18.341964721679688, test_abs_avg=18.3219051361084
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5465841293334961, max_abs=4.0, mean_rel=0.15278297662734985, max_rel=907.1171875, norm_rel=0.023589374497532845, ref_abs_avg=23.233043670654297, test_abs_avg=23.23296546936035
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.515709638595581, max_abs=3.75, mean_rel=0.27881723642349243, max_rel=1687.4998779296875, norm_rel=0.022448066622018814, ref_abs_avg=22.900474548339844, test_abs_avg=22.900333404541016
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.40569600462913513, max_abs=1.4375, mean_rel=0.1251314878463745, max_rel=18.895349502563477, norm_rel=0.02230847254395485, ref_abs_avg=18.082612991333008, test_abs_avg=18.075668334960938
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5215258598327637, max_abs=4.0, mean_rel=0.13913331925868988, max_rel=906.7833251953125, norm_rel=0.02304963394999504, ref_abs_avg=22.665037155151367, test_abs_avg=22.664154052734375
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.4842139184474945, max_abs=3.5, mean_rel=0.22335296869277954, max_rel=1624.9998779296875, norm_rel=0.021740907803177834, ref_abs_avg=22.28341293334961, test_abs_avg=22.296676635742188
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3696126937866211, max_abs=1.375, mean_rel=0.1225292980670929, max_rel=9.967547416687012, norm_rel=0.01966739445924759, ref_abs_avg=18.924327850341797, test_abs_avg=18.94833755493164
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.49438923597335815, max_abs=4.25, mean_rel=0.14465542137622833, max_rel=665.4672241210938, norm_rel=0.02279098890721798, ref_abs_avg=21.734825134277344, test_abs_avg=21.73474884033203
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.45861679315567017, max_abs=4.5, mean_rel=0.2009817659854889, max_rel=1539.0623779296875, norm_rel=0.02068294771015644, ref_abs_avg=22.120101928710938, test_abs_avg=22.122554779052734
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4264974594116211, max_abs=1.75, mean_rel=0.08184567093849182, max_rel=3.526226758956909, norm_rel=0.022357536479830742, ref_abs_avg=19.145671844482422, test_abs_avg=19.12051773071289
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5457713603973389, max_abs=5.0, mean_rel=0.15810461342334747, max_rel=1118.9254150390625, norm_rel=0.024512911215424538, ref_abs_avg=22.310497283935547, test_abs_avg=22.309335708618164
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5028600692749023, max_abs=4.25, mean_rel=0.1914997696876526, max_rel=1046.875, norm_rel=0.02314799837768078, ref_abs_avg=21.77385711669922, test_abs_avg=21.768131256103516
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.38994717597961426, max_abs=1.408203125, mean_rel=0.10292036086320877, max_rel=4.954943656921387, norm_rel=0.022068625316023827, ref_abs_avg=17.088821411132812, test_abs_avg=17.06377410888672
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5044323801994324, max_abs=4.5, mean_rel=0.14328348636627197, max_rel=570.4386596679688, norm_rel=0.023655889555811882, ref_abs_avg=21.379249572753906, test_abs_avg=21.38112449645996
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4664101302623749, max_abs=3.5, mean_rel=0.193314328789711, max_rel=1265.625, norm_rel=0.022143451496958733, ref_abs_avg=21.073320388793945, test_abs_avg=21.066431045532227
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.37406420707702637, max_abs=1.625, mean_rel=0.11006838083267212, max_rel=8.610838890075684, norm_rel=0.02374277450144291, ref_abs_avg=16.558956146240234, test_abs_avg=16.534427642822266
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.47339802980422974, max_abs=4.5, mean_rel=0.14390461146831512, max_rel=661.4157104492188, norm_rel=0.023391151800751686, ref_abs_avg=20.323837280273438, test_abs_avg=20.323490142822266
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4385286569595337, max_abs=3.5, mean_rel=0.21794861555099487, max_rel=1937.4998779296875, norm_rel=0.022053416818380356, ref_abs_avg=19.945226669311523, test_abs_avg=19.944965362548828
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3534812927246094, max_abs=1.25, mean_rel=0.08075099438428879, max_rel=6.967892646789551, norm_rel=0.021479113027453423, ref_abs_avg=16.363388061523438, test_abs_avg=16.395004272460938
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4366392493247986, max_abs=4.75, mean_rel=0.14695264399051666, max_rel=927.6920776367188, norm_rel=0.022881103679537773, ref_abs_avg=19.187698364257812, test_abs_avg=19.190706253051758
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4007212221622467, max_abs=3.875, mean_rel=0.20355214178562164, max_rel=1187.5, norm_rel=0.020985811948776245, ref_abs_avg=19.24138641357422, test_abs_avg=19.237218856811523
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3312423825263977, max_abs=1.25, mean_rel=0.0722406655550003, max_rel=4.8360276222229, norm_rel=0.020384933799505234, ref_abs_avg=16.349231719970703, test_abs_avg=16.29857635498047
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4140883684158325, max_abs=4.0, mean_rel=0.13214877247810364, max_rel=531.2129516601562, norm_rel=0.02207597903907299, ref_abs_avg=18.884910583496094, test_abs_avg=18.88680648803711
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3752642273902893, max_abs=3.25, mean_rel=0.1980394870042801, max_rel=1812.4998779296875, norm_rel=0.020343564450740814, ref_abs_avg=18.56018829345703, test_abs_avg=18.55793571472168
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3183894157409668, max_abs=1.25, mean_rel=0.0881408154964447, max_rel=5.716397285461426, norm_rel=0.02079620398581028, ref_abs_avg=15.295137405395508, test_abs_avg=15.345466613769531
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.3944058418273926, max_abs=5.0, mean_rel=0.13071539998054504, max_rel=677.493408203125, norm_rel=0.021474825218319893, ref_abs_avg=18.534738540649414, test_abs_avg=18.536415100097656
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3534390926361084, max_abs=3.0, mean_rel=0.17760245501995087, max_rel=1624.9998779296875, norm_rel=0.019374748691916466, ref_abs_avg=18.308181762695312, test_abs_avg=18.313800811767578
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2929086685180664, max_abs=1.1875, mean_rel=0.06030962988734245, max_rel=2.5768465995788574, norm_rel=0.020381197333335876, ref_abs_avg=14.411117553710938, test_abs_avg=14.416457176208496
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3717721700668335, max_abs=4.5, mean_rel=0.1285165250301361, max_rel=714.5320434570312, norm_rel=0.021107181906700134, ref_abs_avg=17.866851806640625, test_abs_avg=17.86794662475586
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.33695337176322937, max_abs=3.375, mean_rel=0.17964212596416473, max_rel=999.9999389648438, norm_rel=0.020128590986132622, ref_abs_avg=17.05765724182129, test_abs_avg=17.051834106445312
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.27268660068511963, max_abs=1.25, mean_rel=0.19177472591400146, max_rel=34.41165542602539, norm_rel=0.02017645537853241, ref_abs_avg=14.271836280822754, test_abs_avg=14.259393692016602
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.34834378957748413, max_abs=4.0, mean_rel=0.1242811530828476, max_rel=822.7925415039062, norm_rel=0.02056334726512432, ref_abs_avg=17.244049072265625, test_abs_avg=17.24407958984375
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3133937418460846, max_abs=3.75, mean_rel=0.1522424966096878, max_rel=999.9999389648438, norm_rel=0.01842665858566761, ref_abs_avg=17.365198135375977, test_abs_avg=17.360671997070312
production_forward2 vs paper_forward output: mean_abs=0.0016011937987059355, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008027752861380577, max_abs=0.328125, mean_rel=0.07065524905920029, max_rel=92.44483184814453, norm_rel=0.01922805793583393, ref_abs_avg=0.4505610764026642, test_abs_avg=0.45057839155197144
production_forward2 grad[1] vs paper_forward: mean_abs=6.897032737731934, max_abs=56.0, mean_rel=0.150363028049469, max_rel=275.4474182128906, norm_rel=0.019576240330934525, ref_abs_avg=308.77978515625, test_abs_avg=308.8616943359375
production_forward2 grad[2] vs paper_forward: mean_abs=1.219684362411499, max_abs=5.0, mean_rel=0.1781032383441925, max_rel=56.567623138427734, norm_rel=0.020853275433182716, ref_abs_avg=58.351951599121094, test_abs_avg=58.31175231933594
production_forward2 grad[3] vs paper_forward: mean_abs=1.5615500211715698, max_abs=12.0, mean_rel=0.17659631371498108, max_rel=1717.2392578125, norm_rel=0.02395966462790966, ref_abs_avg=65.63430786132812, test_abs_avg=65.63926696777344
production_forward2 grad[4] vs paper_forward: mean_abs=1.4287391901016235, max_abs=10.0, mean_rel=0.3743496537208557, max_rel=4500.0, norm_rel=0.022260576486587524, ref_abs_avg=64.4645004272461, test_abs_avg=64.48133850097656
production_forward2 grad[5] vs paper_forward: mean_abs=1.0597639083862305, max_abs=5.0, mean_rel=0.09432978928089142, max_rel=3.4627389907836914, norm_rel=0.02351699396967888, ref_abs_avg=45.583980560302734, test_abs_avg=45.533546447753906
production_forward2 grad[6] vs paper_forward: mean_abs=1.3508360385894775, max_abs=10.0, mean_rel=0.1720639020204544, max_rel=3440.2880859375, norm_rel=0.02360694296658039, ref_abs_avg=57.700401306152344, test_abs_avg=57.69841384887695
production_forward2 grad[7] vs paper_forward: mean_abs=1.2474838495254517, max_abs=7.5, mean_rel=0.33485063910484314, max_rel=3281.249755859375, norm_rel=0.02200961671769619, ref_abs_avg=57.033931732177734, test_abs_avg=57.03258514404297
production_forward2 grad[8] vs paper_forward: mean_abs=0.9692695140838623, max_abs=3.75, mean_rel=0.09483826160430908, max_rel=7.831490516662598, norm_rel=0.021914415061473846, ref_abs_avg=43.84656524658203, test_abs_avg=43.866172790527344
production_forward2 grad[9] vs paper_forward: mean_abs=1.225643277168274, max_abs=10.0, mean_rel=0.1574225127696991, max_rel=3196.371337890625, norm_rel=0.023485135287046432, ref_abs_avg=52.56227111816406, test_abs_avg=52.56334686279297
production_forward2 grad[10] vs paper_forward: mean_abs=1.1250431537628174, max_abs=7.5, mean_rel=0.3072856366634369, max_rel=4187.5, norm_rel=0.021745026111602783, ref_abs_avg=52.0232048034668, test_abs_avg=52.034629821777344
production_forward2 grad[11] vs paper_forward: mean_abs=0.8551135063171387, max_abs=3.0, mean_rel=0.3428809940814972, max_rel=109.85114288330078, norm_rel=0.020378712564706802, ref_abs_avg=42.228885650634766, test_abs_avg=42.32545471191406
production_forward2 grad[12] vs paper_forward: mean_abs=1.1211903095245361, max_abs=8.0, mean_rel=0.15579676628112793, max_rel=1334.88720703125, norm_rel=0.023206336423754692, ref_abs_avg=48.6397590637207, test_abs_avg=48.637603759765625
production_forward2 grad[13] vs paper_forward: mean_abs=1.0291633605957031, max_abs=6.5, mean_rel=0.30172449350357056, max_rel=2874.999755859375, norm_rel=0.021505678072571754, ref_abs_avg=48.178104400634766, test_abs_avg=48.179359436035156
production_forward2 grad[14] vs paper_forward: mean_abs=0.8025331497192383, max_abs=3.75, mean_rel=0.09285569936037064, max_rel=14.48813247680664, norm_rel=0.020841768011450768, ref_abs_avg=38.63197708129883, test_abs_avg=38.68855285644531
production_forward2 grad[15] vs paper_forward: mean_abs=1.0532231330871582, max_abs=8.0, mean_rel=0.1463080644607544, max_rel=1648.6124267578125, norm_rel=0.023102903738617897, ref_abs_avg=45.857749938964844, test_abs_avg=45.85781478881836
production_forward2 grad[16] vs paper_forward: mean_abs=0.9621654152870178, max_abs=5.875, mean_rel=0.3366871476173401, max_rel=3249.999755859375, norm_rel=0.021417945623397827, ref_abs_avg=45.20634841918945, test_abs_avg=45.209228515625
production_forward2 grad[17] vs paper_forward: mean_abs=0.6788515448570251, max_abs=3.8125, mean_rel=0.22556465864181519, max_rel=57.430110931396484, norm_rel=0.02042185515165329, ref_abs_avg=34.867000579833984, test_abs_avg=34.85896682739258
production_forward2 grad[18] vs paper_forward: mean_abs=0.9877474904060364, max_abs=7.0, mean_rel=0.16379812359809875, max_rel=1659.0823974609375, norm_rel=0.023015161976218224, ref_abs_avg=43.20121765136719, test_abs_avg=43.201210021972656
production_forward2 grad[19] vs paper_forward: mean_abs=0.9065544009208679, max_abs=6.0, mean_rel=0.28487420082092285, max_rel=2781.249755859375, norm_rel=0.02124442718923092, ref_abs_avg=42.90481185913086, test_abs_avg=42.909584045410156
production_forward2 grad[20] vs paper_forward: mean_abs=0.7333865165710449, max_abs=3.25, mean_rel=0.11205551773309708, max_rel=14.054666519165039, norm_rel=0.021786412224173546, ref_abs_avg=33.54901123046875, test_abs_avg=33.60080337524414
production_forward2 grad[21] vs paper_forward: mean_abs=0.9370568990707397, max_abs=7.0, mean_rel=0.16241559386253357, max_rel=2275.44970703125, norm_rel=0.022886982187628746, ref_abs_avg=41.175411224365234, test_abs_avg=41.17963790893555
production_forward2 grad[22] vs paper_forward: mean_abs=0.8587852716445923, max_abs=5.125, mean_rel=0.27522873878479004, max_rel=2874.999755859375, norm_rel=0.021214943379163742, ref_abs_avg=40.67622375488281, test_abs_avg=40.67768478393555
production_forward2 grad[23] vs paper_forward: mean_abs=0.7150897979736328, max_abs=2.5625, mean_rel=0.09048091620206833, max_rel=12.751168251037598, norm_rel=0.021819470450282097, ref_abs_avg=32.734275817871094, test_abs_avg=32.75261306762695
production_forward2 grad[24] vs paper_forward: mean_abs=0.8907895684242249, max_abs=6.5, mean_rel=0.14371268451213837, max_rel=926.8893432617188, norm_rel=0.02277941070497036, ref_abs_avg=39.36503601074219, test_abs_avg=39.36601638793945
production_forward2 grad[25] vs paper_forward: mean_abs=0.816757082939148, max_abs=5.25, mean_rel=0.30725300312042236, max_rel=2999.999755859375, norm_rel=0.02117200382053852, ref_abs_avg=38.750267028808594, test_abs_avg=38.751708984375
production_forward2 grad[26] vs paper_forward: mean_abs=0.7741971015930176, max_abs=2.6875, mean_rel=0.08808326721191406, max_rel=4.595846176147461, norm_rel=0.022233031690120697, ref_abs_avg=35.62565231323242, test_abs_avg=35.61979675292969
production_forward2 grad[27] vs paper_forward: mean_abs=1.0194873809814453, max_abs=7.75, mean_rel=0.15869824588298798, max_rel=1499.529541015625, norm_rel=0.024378618225455284, ref_abs_avg=42.088218688964844, test_abs_avg=42.092620849609375
production_forward2 grad[28] vs paper_forward: mean_abs=0.9368040561676025, max_abs=6.875, mean_rel=0.29485762119293213, max_rel=2671.874755859375, norm_rel=0.02300352044403553, ref_abs_avg=40.95391845703125, test_abs_avg=40.965003967285156
production_forward2 grad[29] vs paper_forward: mean_abs=0.7741857767105103, max_abs=2.75, mean_rel=0.13032172620296478, max_rel=14.2080659866333, norm_rel=0.024685760959982872, ref_abs_avg=30.57917022705078, test_abs_avg=30.6331787109375
production_forward2 grad[30] vs paper_forward: mean_abs=0.9537384510040283, max_abs=7.0, mean_rel=0.15873983502388, max_rel=1185.3544921875, norm_rel=0.024836014956235886, ref_abs_avg=38.59954071044922, test_abs_avg=38.60040283203125
production_forward2 grad[31] vs paper_forward: mean_abs=0.8888170719146729, max_abs=5.75, mean_rel=0.26851290464401245, max_rel=2593.749755859375, norm_rel=0.02334977686405182, ref_abs_avg=38.17947769165039, test_abs_avg=38.18672561645508
production_forward2 grad[32] vs paper_forward: mean_abs=0.7006235122680664, max_abs=2.5, mean_rel=0.18541011214256287, max_rel=26.97990608215332, norm_rel=0.024587105959653854, ref_abs_avg=28.079326629638672, test_abs_avg=28.086814880371094
production_forward2 grad[33] vs paper_forward: mean_abs=0.885122537612915, max_abs=6.0, mean_rel=0.17063254117965698, max_rel=1204.791015625, norm_rel=0.024732215330004692, ref_abs_avg=35.94837188720703, test_abs_avg=35.95094680786133
production_forward2 grad[34] vs paper_forward: mean_abs=0.8289951086044312, max_abs=6.0, mean_rel=0.30302828550338745, max_rel=2328.125, norm_rel=0.02351626381278038, ref_abs_avg=35.415252685546875, test_abs_avg=35.42195129394531
production_forward2 grad[35] vs paper_forward: mean_abs=0.7051646709442139, max_abs=2.75, mean_rel=0.2918570041656494, max_rel=80.95826721191406, norm_rel=0.022669808939099312, ref_abs_avg=30.346580505371094, test_abs_avg=30.271120071411133
production_forward2 grad[36] vs paper_forward: mean_abs=0.8374652862548828, max_abs=6.0, mean_rel=0.1596986949443817, max_rel=1412.5885009765625, norm_rel=0.024466220289468765, ref_abs_avg=34.35063552856445, test_abs_avg=34.35258102416992
production_forward2 grad[37] vs paper_forward: mean_abs=0.772976279258728, max_abs=5.25, mean_rel=0.27956652641296387, max_rel=2593.749755859375, norm_rel=0.023127445951104164, ref_abs_avg=33.58771514892578, test_abs_avg=33.58769226074219
production_forward2 grad[38] vs paper_forward: mean_abs=0.6449499130249023, max_abs=2.5625, mean_rel=0.10288970917463303, max_rel=7.255936622619629, norm_rel=0.025730112567543983, ref_abs_avg=25.63794708251953, test_abs_avg=25.63312530517578
production_forward2 grad[39] vs paper_forward: mean_abs=0.7871012687683105, max_abs=5.0, mean_rel=0.1660660356283188, max_rel=1256.793212890625, norm_rel=0.02443799003958702, ref_abs_avg=32.34889602661133, test_abs_avg=32.351478576660156
production_forward2 grad[40] vs paper_forward: mean_abs=0.7322810888290405, max_abs=4.25, mean_rel=0.22980228066444397, max_rel=2749.999755859375, norm_rel=0.022900255396962166, ref_abs_avg=32.06559753417969, test_abs_avg=32.065486907958984
production_forward2 grad[41] vs paper_forward: mean_abs=0.5822672843933105, max_abs=2.25, mean_rel=0.16487060487270355, max_rel=23.252992630004883, norm_rel=0.023900197818875313, ref_abs_avg=24.418994903564453, test_abs_avg=24.4083194732666
production_forward2 grad[42] vs paper_forward: mean_abs=0.7532444000244141, max_abs=6.0, mean_rel=0.1554097980260849, max_rel=1488.1357421875, norm_rel=0.024128630757331848, ref_abs_avg=31.311620712280273, test_abs_avg=31.315258026123047
production_forward2 grad[43] vs paper_forward: mean_abs=0.6904522776603699, max_abs=4.03125, mean_rel=0.25621768832206726, max_rel=2812.499755859375, norm_rel=0.022576475515961647, ref_abs_avg=30.700672149658203, test_abs_avg=30.702285766601562
production_forward2 grad[44] vs paper_forward: mean_abs=0.548283576965332, max_abs=2.125, mean_rel=0.08933726698160172, max_rel=8.027938842773438, norm_rel=0.023251555860042572, ref_abs_avg=24.042049407958984, test_abs_avg=24.059539794921875
production_forward2 grad[45] vs paper_forward: mean_abs=0.7101624011993408, max_abs=5.0, mean_rel=0.15394799411296844, max_rel=1131.84521484375, norm_rel=0.023772872984409332, ref_abs_avg=30.009084701538086, test_abs_avg=30.013689041137695
production_forward2 grad[46] vs paper_forward: mean_abs=0.658964216709137, max_abs=4.0625, mean_rel=0.27546313405036926, max_rel=1749.9998779296875, norm_rel=0.022532163187861443, ref_abs_avg=29.37299919128418, test_abs_avg=29.374225616455078
production_forward2 grad[47] vs paper_forward: mean_abs=0.5545005798339844, max_abs=2.5, mean_rel=0.08592221140861511, max_rel=8.105752944946289, norm_rel=0.022649191319942474, ref_abs_avg=24.87755584716797, test_abs_avg=24.848522186279297
production_forward2 grad[48] vs paper_forward: mean_abs=0.6816216707229614, max_abs=5.0, mean_rel=0.16798317432403564, max_rel=1557.84521484375, norm_rel=0.02372608333826065, ref_abs_avg=28.84390640258789, test_abs_avg=28.843923568725586
production_forward2 grad[49] vs paper_forward: mean_abs=0.6320343017578125, max_abs=3.875, mean_rel=0.2635345160961151, max_rel=1906.2498779296875, norm_rel=0.02239011786878109, ref_abs_avg=28.348674774169922, test_abs_avg=28.35132598876953
production_forward2 grad[50] vs paper_forward: mean_abs=0.5630531311035156, max_abs=2.375, mean_rel=0.11008515954017639, max_rel=24.574819564819336, norm_rel=0.02319456823170185, ref_abs_avg=23.901294708251953, test_abs_avg=23.934539794921875
production_forward2 grad[51] vs paper_forward: mean_abs=0.7375171184539795, max_abs=5.0, mean_rel=0.1624119132757187, max_rel=1110.8856201171875, norm_rel=0.024888863787055016, ref_abs_avg=29.73512840270996, test_abs_avg=29.73550796508789
production_forward2 grad[52] vs paper_forward: mean_abs=0.6911967396736145, max_abs=4.75, mean_rel=0.28097742795944214, max_rel=2187.5, norm_rel=0.023592974990606308, ref_abs_avg=29.429466247558594, test_abs_avg=29.432714462280273
production_forward2 grad[53] vs paper_forward: mean_abs=0.5307579040527344, max_abs=2.4375, mean_rel=0.08656171709299088, max_rel=11.569784164428711, norm_rel=0.022958947345614433, ref_abs_avg=22.819429397583008, test_abs_avg=22.82869529724121
production_forward2 grad[54] vs paper_forward: mean_abs=0.6893993616104126, max_abs=5.0, mean_rel=0.15304023027420044, max_rel=1346.4300537109375, norm_rel=0.024559490382671356, ref_abs_avg=28.162166595458984, test_abs_avg=28.16333770751953
production_forward2 grad[55] vs paper_forward: mean_abs=0.638728678226471, max_abs=4.0, mean_rel=0.2437981218099594, max_rel=1999.9998779296875, norm_rel=0.02318512089550495, ref_abs_avg=27.574256896972656, test_abs_avg=27.575281143188477
production_forward2 grad[56] vs paper_forward: mean_abs=0.4996849298477173, max_abs=2.0, mean_rel=0.10273543000221252, max_rel=13.75314712524414, norm_rel=0.021747730672359467, ref_abs_avg=23.41936492919922, test_abs_avg=23.458402633666992
production_forward2 grad[57] vs paper_forward: mean_abs=0.6494271755218506, max_abs=5.0, mean_rel=0.16208970546722412, max_rel=1655.78125, norm_rel=0.024280698969960213, ref_abs_avg=26.802780151367188, test_abs_avg=26.803119659423828
production_forward2 grad[58] vs paper_forward: mean_abs=0.6010925769805908, max_abs=4.0, mean_rel=0.2707192599773407, max_rel=2375.0, norm_rel=0.02278299070894718, ref_abs_avg=26.47216033935547, test_abs_avg=26.47275161743164
production_forward2 grad[59] vs paper_forward: mean_abs=0.43432533740997314, max_abs=2.0, mean_rel=0.2862415611743927, max_rel=39.7345085144043, norm_rel=0.020142409950494766, ref_abs_avg=22.43923568725586, test_abs_avg=22.45928192138672
production_forward2 grad[60] vs paper_forward: mean_abs=0.6115803718566895, max_abs=4.25, mean_rel=0.15850207209587097, max_rel=1895.813232421875, norm_rel=0.023838412016630173, ref_abs_avg=25.728946685791016, test_abs_avg=25.72800064086914
production_forward2 grad[61] vs paper_forward: mean_abs=0.5609289407730103, max_abs=3.75, mean_rel=0.21984171867370605, max_rel=1937.4998779296875, norm_rel=0.022564908489584923, ref_abs_avg=24.959976196289062, test_abs_avg=24.9627685546875
production_forward2 grad[62] vs paper_forward: mean_abs=0.4411970376968384, max_abs=2.25, mean_rel=0.08531218767166138, max_rel=7.178616523742676, norm_rel=0.022241223603487015, ref_abs_avg=19.919864654541016, test_abs_avg=19.960468292236328
production_forward2 grad[63] vs paper_forward: mean_abs=0.573878824710846, max_abs=6.5, mean_rel=0.15186192095279694, max_rel=1080.2379150390625, norm_rel=0.023593321442604065, ref_abs_avg=24.368064880371094, test_abs_avg=24.367963790893555
production_forward2 grad[64] vs paper_forward: mean_abs=0.5276033878326416, max_abs=3.125, mean_rel=0.21650412678718567, max_rel=1593.7498779296875, norm_rel=0.021649545058608055, ref_abs_avg=24.378189086914062, test_abs_avg=24.38542938232422
production_forward2 grad[65] vs paper_forward: mean_abs=0.41967201232910156, max_abs=2.0, mean_rel=0.12269173562526703, max_rel=16.753162384033203, norm_rel=0.02349923364818096, ref_abs_avg=18.341964721679688, test_abs_avg=18.33340072631836
production_forward2 grad[66] vs paper_forward: mean_abs=0.5405771732330322, max_abs=4.75, mean_rel=0.15564453601837158, max_rel=1408.6077880859375, norm_rel=0.023337963968515396, ref_abs_avg=23.233043670654297, test_abs_avg=23.233047485351562
production_forward2 grad[67] vs paper_forward: mean_abs=0.5065510869026184, max_abs=3.921875, mean_rel=0.2637206017971039, max_rel=1749.9998779296875, norm_rel=0.022053973749279976, ref_abs_avg=22.900474548339844, test_abs_avg=22.902557373046875
production_forward2 grad[68] vs paper_forward: mean_abs=0.40482842922210693, max_abs=1.6875, mean_rel=0.1444169580936432, max_rel=23.013566970825195, norm_rel=0.022537171840667725, ref_abs_avg=18.082612991333008, test_abs_avg=18.079227447509766
production_forward2 grad[69] vs paper_forward: mean_abs=0.5174572467803955, max_abs=4.25, mean_rel=0.1420011818408966, max_rel=1148.1727294921875, norm_rel=0.022886911407113075, ref_abs_avg=22.665037155151367, test_abs_avg=22.66539764404297
production_forward2 grad[70] vs paper_forward: mean_abs=0.47982972860336304, max_abs=3.5, mean_rel=0.21886536478996277, max_rel=1562.4998779296875, norm_rel=0.021570345386862755, ref_abs_avg=22.28341293334961, test_abs_avg=22.294662475585938
production_forward2 grad[71] vs paper_forward: mean_abs=0.3724026679992676, max_abs=1.375, mean_rel=0.12234759330749512, max_rel=8.155623435974121, norm_rel=0.019735250622034073, ref_abs_avg=18.924327850341797, test_abs_avg=18.945831298828125
production_forward2 grad[72] vs paper_forward: mean_abs=0.49009862542152405, max_abs=4.25, mean_rel=0.1422308385372162, max_rel=1028.326171875, norm_rel=0.02260568179190159, ref_abs_avg=21.734825134277344, test_abs_avg=21.733768463134766
production_forward2 grad[73] vs paper_forward: mean_abs=0.45531773567199707, max_abs=4.0, mean_rel=0.2086537778377533, max_rel=1562.4998779296875, norm_rel=0.02054884284734726, ref_abs_avg=22.120101928710938, test_abs_avg=22.1262149810791
production_forward2 grad[74] vs paper_forward: mean_abs=0.41832515597343445, max_abs=1.75, mean_rel=0.0789732038974762, max_rel=4.248711109161377, norm_rel=0.022057360038161278, ref_abs_avg=19.145671844482422, test_abs_avg=19.12883186340332
production_forward2 grad[75] vs paper_forward: mean_abs=0.5409425497055054, max_abs=5.0, mean_rel=0.16154952347278595, max_rel=1141.4219970703125, norm_rel=0.024305032566189766, ref_abs_avg=22.310497283935547, test_abs_avg=22.308917999267578
production_forward2 grad[76] vs paper_forward: mean_abs=0.4999517500400543, max_abs=4.0, mean_rel=0.19122952222824097, max_rel=1250.0, norm_rel=0.022992169484496117, ref_abs_avg=21.77385711669922, test_abs_avg=21.770687103271484
production_forward2 grad[77] vs paper_forward: mean_abs=0.3877830505371094, max_abs=1.375, mean_rel=0.10702952742576599, max_rel=6.126616477966309, norm_rel=0.021775634959340096, ref_abs_avg=17.088821411132812, test_abs_avg=17.07878875732422
production_forward2 grad[78] vs paper_forward: mean_abs=0.500255823135376, max_abs=4.0, mean_rel=0.14207956194877625, max_rel=460.3016357421875, norm_rel=0.02346593700349331, ref_abs_avg=21.379249572753906, test_abs_avg=21.380115509033203
production_forward2 grad[79] vs paper_forward: mean_abs=0.4598374366760254, max_abs=3.25, mean_rel=0.19014865159988403, max_rel=1187.5, norm_rel=0.0218390841037035, ref_abs_avg=21.073320388793945, test_abs_avg=21.070798873901367
production_forward2 grad[80] vs paper_forward: mean_abs=0.3930366039276123, max_abs=1.9375, mean_rel=0.11236388981342316, max_rel=12.476417541503906, norm_rel=0.024607926607131958, ref_abs_avg=16.558956146240234, test_abs_avg=16.538818359375
production_forward2 grad[81] vs paper_forward: mean_abs=0.47003820538520813, max_abs=4.5, mean_rel=0.14758709073066711, max_rel=707.1844482421875, norm_rel=0.023226376622915268, ref_abs_avg=20.323837280273438, test_abs_avg=20.32267189025879
production_forward2 grad[82] vs paper_forward: mean_abs=0.4324099123477936, max_abs=3.5, mean_rel=0.2219514548778534, max_rel=1531.2498779296875, norm_rel=0.02172963321208954, ref_abs_avg=19.945226669311523, test_abs_avg=19.949199676513672
production_forward2 grad[83] vs paper_forward: mean_abs=0.34797191619873047, max_abs=1.375, mean_rel=0.06864595413208008, max_rel=3.26550555229187, norm_rel=0.02129395306110382, ref_abs_avg=16.363388061523438, test_abs_avg=16.397377014160156
production_forward2 grad[84] vs paper_forward: mean_abs=0.4342309236526489, max_abs=4.0, mean_rel=0.1499461829662323, max_rel=975.6656494140625, norm_rel=0.02278316393494606, ref_abs_avg=19.187698364257812, test_abs_avg=19.189422607421875
production_forward2 grad[85] vs paper_forward: mean_abs=0.39686456322669983, max_abs=3.25, mean_rel=0.2104130983352661, max_rel=1257.8125, norm_rel=0.020756615325808525, ref_abs_avg=19.24138641357422, test_abs_avg=19.23569107055664
production_forward2 grad[86] vs paper_forward: mean_abs=0.32847124338150024, max_abs=1.5, mean_rel=0.12276799231767654, max_rel=27.027538299560547, norm_rel=0.020333532243967056, ref_abs_avg=16.349231719970703, test_abs_avg=16.29897117614746
production_forward2 grad[87] vs paper_forward: mean_abs=0.41209012269973755, max_abs=4.0, mean_rel=0.1323392540216446, max_rel=525.663330078125, norm_rel=0.021958615630865097, ref_abs_avg=18.884910583496094, test_abs_avg=18.887359619140625
production_forward2 grad[88] vs paper_forward: mean_abs=0.3724917769432068, max_abs=3.0, mean_rel=0.17700177431106567, max_rel=968.7499389648438, norm_rel=0.020195908844470978, ref_abs_avg=18.56018829345703, test_abs_avg=18.56058692932129
production_forward2 grad[89] vs paper_forward: mean_abs=0.32230520248413086, max_abs=1.34375, mean_rel=0.0923488438129425, max_rel=8.939716339111328, norm_rel=0.02114424854516983, ref_abs_avg=15.295137405395508, test_abs_avg=15.320459365844727
production_forward2 grad[90] vs paper_forward: mean_abs=0.39299625158309937, max_abs=5.5, mean_rel=0.13237780332565308, max_rel=664.3700561523438, norm_rel=0.021411921828985214, ref_abs_avg=18.534738540649414, test_abs_avg=18.536903381347656
production_forward2 grad[91] vs paper_forward: mean_abs=0.3512110114097595, max_abs=3.25, mean_rel=0.16569828987121582, max_rel=1156.25, norm_rel=0.01924881711602211, ref_abs_avg=18.308181762695312, test_abs_avg=18.31078338623047
production_forward2 grad[92] vs paper_forward: mean_abs=0.31361961364746094, max_abs=1.125, mean_rel=0.06395156681537628, max_rel=2.937474012374878, norm_rel=0.02180304378271103, ref_abs_avg=14.411117553710938, test_abs_avg=14.402097702026367
production_forward2 grad[93] vs paper_forward: mean_abs=0.3710510730743408, max_abs=4.0, mean_rel=0.1278296858072281, max_rel=872.2398681640625, norm_rel=0.021091531962156296, ref_abs_avg=17.866851806640625, test_abs_avg=17.86756134033203
production_forward2 grad[94] vs paper_forward: mean_abs=0.33543866872787476, max_abs=3.25, mean_rel=0.17858579754829407, max_rel=1312.4998779296875, norm_rel=0.02008126489818096, ref_abs_avg=17.05765724182129, test_abs_avg=17.052417755126953
production_forward2 grad[95] vs paper_forward: mean_abs=0.2640763521194458, max_abs=1.0, mean_rel=0.19806545972824097, max_rel=34.869083404541016, norm_rel=0.019601596519351006, ref_abs_avg=14.271836280822754, test_abs_avg=14.257966995239258
production_forward2 grad[96] vs paper_forward: mean_abs=0.3481304943561554, max_abs=3.75, mean_rel=0.12228824943304062, max_rel=629.8927612304688, norm_rel=0.02055799402296543, ref_abs_avg=17.244049072265625, test_abs_avg=17.24372673034668
production_forward2 grad[97] vs paper_forward: mean_abs=0.31061723828315735, max_abs=3.25, mean_rel=0.15357741713523865, max_rel=1062.5, norm_rel=0.01824083738029003, ref_abs_avg=17.365198135375977, test_abs_avg=17.362083435058594
identity layers + randn queries
production_forward fwd+bwd:  113.542 ms
production_forward bwd-only: 95.973 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.324 GiB, fwd+bwd=10.324 GiB
paper_forward fwd+bwd:  381.613 ms
paper_forward bwd-only: 301.494 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.742 GiB, fwd+bwd=32.492 GiB
production_forward2 fwd+bwd:  114.420 ms
production_forward2 bwd-only: 95.957 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.324 GiB, fwd+bwd=10.324 GiB
torch_compile_phases_forward fwd+bwd:  167.053 ms
torch_compile_phases_forward bwd-only: 132.852 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016056876629590988, max_abs=0.041015625
production_forward grad[0] vs paper_forward: mean_abs=0.007991256192326546, max_abs=0.359375, mean_rel=0.07077689468860626, max_rel=109.62318420410156, norm_rel=0.019296979531645775, ref_abs_avg=0.4474659860134125, test_abs_avg=0.447476327419281
production_forward grad[1] vs paper_forward: mean_abs=6.849826812744141, max_abs=64.0, mean_rel=0.3895452320575714, max_rel=2559.982177734375, norm_rel=0.01931925304234028, ref_abs_avg=312.1807556152344, test_abs_avg=312.15850830078125
production_forward grad[2] vs paper_forward: mean_abs=1.235703468322754, max_abs=4.5, mean_rel=0.13732172548770905, max_rel=29.110074996948242, norm_rel=0.02431359328329563, ref_abs_avg=49.95672607421875, test_abs_avg=49.97590637207031
production_forward grad[3] vs paper_forward: mean_abs=1.5318742990493774, max_abs=11.0, mean_rel=0.17594246566295624, max_rel=2746.7255859375, norm_rel=0.02409060113132, ref_abs_avg=63.98973083496094, test_abs_avg=63.99421691894531
production_forward grad[4] vs paper_forward: mean_abs=1.4005672931671143, max_abs=9.5, mean_rel=0.32066619396209717, max_rel=3593.749755859375, norm_rel=0.022210262715816498, ref_abs_avg=63.489173889160156, test_abs_avg=63.50056457519531
production_forward grad[5] vs paper_forward: mean_abs=1.0103120803833008, max_abs=4.25, mean_rel=0.10353896021842957, max_rel=6.576303005218506, norm_rel=0.022272279486060143, ref_abs_avg=45.798519134521484, test_abs_avg=45.781532287597656
production_forward grad[6] vs paper_forward: mean_abs=1.3250703811645508, max_abs=9.0, mean_rel=0.16341622173786163, max_rel=1472.4808349609375, norm_rel=0.02382754161953926, ref_abs_avg=56.01063537597656, test_abs_avg=56.012939453125
production_forward grad[7] vs paper_forward: mean_abs=1.21257746219635, max_abs=8.0, mean_rel=0.42159172892570496, max_rel=3624.999755859375, norm_rel=0.022011814638972282, ref_abs_avg=55.39360809326172, test_abs_avg=55.402740478515625
production_forward grad[8] vs paper_forward: mean_abs=0.9453344345092773, max_abs=4.375, mean_rel=0.14334197342395782, max_rel=14.389866828918457, norm_rel=0.022794000804424286, ref_abs_avg=42.1494026184082, test_abs_avg=42.19422149658203
production_forward grad[9] vs paper_forward: mean_abs=1.2091327905654907, max_abs=9.0, mean_rel=0.16369768977165222, max_rel=1352.346435546875, norm_rel=0.02364920638501644, ref_abs_avg=51.49037170410156, test_abs_avg=51.49275207519531
production_forward grad[10] vs paper_forward: mean_abs=1.1024250984191895, max_abs=6.25, mean_rel=0.3344132900238037, max_rel=2999.999755859375, norm_rel=0.0218329019844532, ref_abs_avg=50.795597076416016, test_abs_avg=50.805931091308594
production_forward grad[11] vs paper_forward: mean_abs=0.8640033006668091, max_abs=3.5, mean_rel=0.0978500247001648, max_rel=18.136198043823242, norm_rel=0.023579565808176994, ref_abs_avg=37.748252868652344, test_abs_avg=37.7128791809082
production_forward grad[12] vs paper_forward: mean_abs=1.1248600482940674, max_abs=8.0, mean_rel=0.15037104487419128, max_rel=1119.7464599609375, norm_rel=0.02345442585647106, ref_abs_avg=48.29217529296875, test_abs_avg=48.29603958129883
production_forward grad[13] vs paper_forward: mean_abs=1.0280351638793945, max_abs=6.5, mean_rel=0.38039490580558777, max_rel=4187.5, norm_rel=0.021755464375019073, ref_abs_avg=47.478492736816406, test_abs_avg=47.48486328125
production_forward grad[14] vs paper_forward: mean_abs=0.8047409057617188, max_abs=3.125, mean_rel=0.07197169959545135, max_rel=3.233205795288086, norm_rel=0.021641507744789124, ref_abs_avg=37.344818115234375, test_abs_avg=37.26249694824219
production_forward grad[15] vs paper_forward: mean_abs=1.0486541986465454, max_abs=8.0, mean_rel=0.15828156471252441, max_rel=2411.558837890625, norm_rel=0.023286757990717888, ref_abs_avg=45.29884338378906, test_abs_avg=45.30219268798828
production_forward grad[16] vs paper_forward: mean_abs=0.9588913917541504, max_abs=5.5, mean_rel=0.33799612522125244, max_rel=2999.999755859375, norm_rel=0.02164231240749359, ref_abs_avg=44.584564208984375, test_abs_avg=44.587135314941406
production_forward grad[17] vs paper_forward: mean_abs=0.7459619045257568, max_abs=3.25, mean_rel=0.11676369607448578, max_rel=7.830923557281494, norm_rel=0.02249804511666298, ref_abs_avg=33.45771789550781, test_abs_avg=33.44586181640625
production_forward grad[18] vs paper_forward: mean_abs=0.981658935546875, max_abs=7.0, mean_rel=0.1546257734298706, max_rel=1455.14013671875, norm_rel=0.023185906931757927, ref_abs_avg=42.624534606933594, test_abs_avg=42.62331771850586
production_forward grad[19] vs paper_forward: mean_abs=0.8985117077827454, max_abs=5.5, mean_rel=0.3250548243522644, max_rel=2968.749755859375, norm_rel=0.02155117318034172, ref_abs_avg=41.87931823730469, test_abs_avg=41.88311767578125
production_forward grad[20] vs paper_forward: mean_abs=0.6998763084411621, max_abs=3.25, mean_rel=0.1827629804611206, max_rel=47.13004684448242, norm_rel=0.020960867404937744, ref_abs_avg=33.931922912597656, test_abs_avg=33.924556732177734
production_forward grad[21] vs paper_forward: mean_abs=0.928726077079773, max_abs=7.0, mean_rel=0.15008202195167542, max_rel=1698.62109375, norm_rel=0.023106655105948448, ref_abs_avg=40.470191955566406, test_abs_avg=40.46965026855469
production_forward grad[22] vs paper_forward: mean_abs=0.8503208160400391, max_abs=5.5, mean_rel=0.28791114687919617, max_rel=2406.25, norm_rel=0.0214138962328434, ref_abs_avg=39.88436508178711, test_abs_avg=39.88603973388672
production_forward grad[23] vs paper_forward: mean_abs=0.6506868004798889, max_abs=2.375, mean_rel=0.11132541298866272, max_rel=15.334622383117676, norm_rel=0.02008209004998207, ref_abs_avg=31.831401824951172, test_abs_avg=31.823501586914062
production_forward grad[24] vs paper_forward: mean_abs=0.8868436813354492, max_abs=6.0, mean_rel=0.1485045999288559, max_rel=791.5115966796875, norm_rel=0.022881099954247475, ref_abs_avg=39.002845764160156, test_abs_avg=39.00376510620117
production_forward grad[25] vs paper_forward: mean_abs=0.8100370168685913, max_abs=5.0, mean_rel=0.26967570185661316, max_rel=2375.0, norm_rel=0.021064789965748787, ref_abs_avg=38.643280029296875, test_abs_avg=38.6390380859375
production_forward grad[26] vs paper_forward: mean_abs=0.7708530426025391, max_abs=3.0, mean_rel=0.11338148266077042, max_rel=7.8039116859436035, norm_rel=0.02368774823844433, ref_abs_avg=32.38740158081055, test_abs_avg=32.406394958496094
production_forward grad[27] vs paper_forward: mean_abs=1.009087324142456, max_abs=7.25, mean_rel=0.18293915688991547, max_rel=2383.85107421875, norm_rel=0.024687504395842552, ref_abs_avg=41.104793548583984, test_abs_avg=41.11111068725586
production_forward grad[28] vs paper_forward: mean_abs=0.9321367740631104, max_abs=5.625, mean_rel=0.3023524880409241, max_rel=2874.999755859375, norm_rel=0.02294967696070671, ref_abs_avg=40.81169891357422, test_abs_avg=40.813011169433594
production_forward grad[29] vs paper_forward: mean_abs=0.7936923503875732, max_abs=3.0, mean_rel=0.17967557907104492, max_rel=21.877246856689453, norm_rel=0.025410234928131104, ref_abs_avg=30.43256378173828, test_abs_avg=30.42848014831543
production_forward grad[30] vs paper_forward: mean_abs=0.9417569637298584, max_abs=8.0, mean_rel=0.16370825469493866, max_rel=1721.6417236328125, norm_rel=0.025086838752031326, ref_abs_avg=37.733734130859375, test_abs_avg=37.73671340942383
production_forward grad[31] vs paper_forward: mean_abs=0.882937490940094, max_abs=6.0, mean_rel=0.3162538707256317, max_rel=2375.0, norm_rel=0.023811237886548042, ref_abs_avg=37.25914001464844, test_abs_avg=37.261985778808594
production_forward grad[32] vs paper_forward: mean_abs=0.7011356353759766, max_abs=2.75, mean_rel=0.08645782619714737, max_rel=5.15634298324585, norm_rel=0.024012627080082893, ref_abs_avg=29.326904296875, test_abs_avg=29.382822036743164
production_forward grad[33] vs paper_forward: mean_abs=0.883716344833374, max_abs=5.875, mean_rel=0.16014865040779114, max_rel=1711.7371826171875, norm_rel=0.02501426637172699, ref_abs_avg=35.517311096191406, test_abs_avg=35.51593017578125
production_forward grad[34] vs paper_forward: mean_abs=0.8211179971694946, max_abs=4.75, mean_rel=0.30419766902923584, max_rel=1999.9998779296875, norm_rel=0.023488448932766914, ref_abs_avg=35.048824310302734, test_abs_avg=35.04670715332031
production_forward grad[35] vs paper_forward: mean_abs=0.6689646244049072, max_abs=2.5625, mean_rel=0.1004219502210617, max_rel=9.445460319519043, norm_rel=0.024840619415044785, ref_abs_avg=27.649303436279297, test_abs_avg=27.65239715576172
production_forward grad[36] vs paper_forward: mean_abs=0.8320024609565735, max_abs=7.0, mean_rel=0.1629858911037445, max_rel=1670.055908203125, norm_rel=0.024959415197372437, ref_abs_avg=33.53178024291992, test_abs_avg=33.530174255371094
production_forward grad[37] vs paper_forward: mean_abs=0.7738316059112549, max_abs=5.0, mean_rel=0.26564204692840576, max_rel=2937.499755859375, norm_rel=0.023384492844343185, ref_abs_avg=33.286224365234375, test_abs_avg=33.29121398925781
production_forward grad[38] vs paper_forward: mean_abs=0.6071064472198486, max_abs=2.283203125, mean_rel=0.17059481143951416, max_rel=34.883026123046875, norm_rel=0.023547329008579254, ref_abs_avg=25.921640396118164, test_abs_avg=25.915712356567383
production_forward grad[39] vs paper_forward: mean_abs=0.7876396179199219, max_abs=5.0, mean_rel=0.15152449905872345, max_rel=914.9249267578125, norm_rel=0.024629246443510056, ref_abs_avg=32.10887908935547, test_abs_avg=32.11180877685547
production_forward grad[40] vs paper_forward: mean_abs=0.7278226613998413, max_abs=4.4375, mean_rel=0.27162835001945496, max_rel=2062.5, norm_rel=0.02301134541630745, ref_abs_avg=31.746898651123047, test_abs_avg=31.75060272216797
production_forward grad[41] vs paper_forward: mean_abs=0.5990256071090698, max_abs=2.375, mean_rel=0.08563782274723053, max_rel=15.701078414916992, norm_rel=0.024526741355657578, ref_abs_avg=25.104482650756836, test_abs_avg=25.19913673400879
production_forward grad[42] vs paper_forward: mean_abs=0.7524082064628601, max_abs=5.0, mean_rel=0.16125108301639557, max_rel=855.0538330078125, norm_rel=0.02438836731016636, ref_abs_avg=30.987443923950195, test_abs_avg=30.98763084411621
production_forward grad[43] vs paper_forward: mean_abs=0.6987869739532471, max_abs=4.7578125, mean_rel=0.26437628269195557, max_rel=2265.625, norm_rel=0.022791940718889236, ref_abs_avg=30.734745025634766, test_abs_avg=30.730606079101562
production_forward grad[44] vs paper_forward: mean_abs=0.5841636657714844, max_abs=2.25, mean_rel=0.06420556455850601, max_rel=5.271716594696045, norm_rel=0.023389440029859543, ref_abs_avg=25.52114486694336, test_abs_avg=25.574058532714844
production_forward grad[45] vs paper_forward: mean_abs=0.7141295075416565, max_abs=4.5, mean_rel=0.15727588534355164, max_rel=1107.0733642578125, norm_rel=0.024081801995635033, ref_abs_avg=29.73104476928711, test_abs_avg=29.73107147216797
production_forward grad[46] vs paper_forward: mean_abs=0.6625820398330688, max_abs=4.0, mean_rel=0.2735578119754791, max_rel=2031.2498779296875, norm_rel=0.02261018566787243, ref_abs_avg=29.360727310180664, test_abs_avg=29.362857818603516
production_forward grad[47] vs paper_forward: mean_abs=0.5465415716171265, max_abs=2.125, mean_rel=0.3792192041873932, max_rel=66.53377532958984, norm_rel=0.023179199546575546, ref_abs_avg=23.63433074951172, test_abs_avg=23.599170684814453
production_forward grad[48] vs paper_forward: mean_abs=0.6810943484306335, max_abs=5.5, mean_rel=0.15426433086395264, max_rel=1124.5782470703125, norm_rel=0.023954002186655998, ref_abs_avg=28.512149810791016, test_abs_avg=28.513931274414062
production_forward grad[49] vs paper_forward: mean_abs=0.6319794654846191, max_abs=4.0, mean_rel=0.2610481381416321, max_rel=1914.0623779296875, norm_rel=0.022388508543372154, ref_abs_avg=28.31332778930664, test_abs_avg=28.311668395996094
production_forward grad[50] vs paper_forward: mean_abs=0.5885086059570312, max_abs=2.078125, mean_rel=0.28956684470176697, max_rel=86.42251586914062, norm_rel=0.024524273350834846, ref_abs_avg=23.950767517089844, test_abs_avg=24.027362823486328
production_forward grad[51] vs paper_forward: mean_abs=0.761661171913147, max_abs=6.5, mean_rel=0.15395863354206085, max_rel=650.2862548828125, norm_rel=0.0253387950360775, ref_abs_avg=30.183094024658203, test_abs_avg=30.181781768798828
production_forward grad[52] vs paper_forward: mean_abs=0.7069550156593323, max_abs=5.0, mean_rel=0.27580487728118896, max_rel=2562.5, norm_rel=0.02399637922644615, ref_abs_avg=29.565296173095703, test_abs_avg=29.56513214111328
production_forward grad[53] vs paper_forward: mean_abs=0.5627869367599487, max_abs=2.5, mean_rel=0.14640314877033234, max_rel=11.352185249328613, norm_rel=0.02455805614590645, ref_abs_avg=22.59943962097168, test_abs_avg=22.568058013916016
production_forward grad[54] vs paper_forward: mean_abs=0.6934982538223267, max_abs=4.5, mean_rel=0.16007034480571747, max_rel=1124.2933349609375, norm_rel=0.025158178061246872, ref_abs_avg=27.63666534423828, test_abs_avg=27.639177322387695
production_forward grad[55] vs paper_forward: mean_abs=0.6464455127716064, max_abs=4.0, mean_rel=0.2602534294128418, max_rel=1781.2498779296875, norm_rel=0.02330983243882656, ref_abs_avg=27.71337890625, test_abs_avg=27.707290649414062
production_forward grad[56] vs paper_forward: mean_abs=0.5104281306266785, max_abs=1.875, mean_rel=0.2716549038887024, max_rel=74.2433090209961, norm_rel=0.022677533328533173, ref_abs_avg=22.06719207763672, test_abs_avg=22.064247131347656
production_forward grad[57] vs paper_forward: mean_abs=0.6525710225105286, max_abs=5.5, mean_rel=0.14850011467933655, max_rel=1005.9197387695312, norm_rel=0.024581292644143105, ref_abs_avg=26.63531494140625, test_abs_avg=26.635120391845703
production_forward grad[58] vs paper_forward: mean_abs=0.604469358921051, max_abs=3.875, mean_rel=0.25753527879714966, max_rel=1687.4998779296875, norm_rel=0.022800981998443604, ref_abs_avg=26.509746551513672, test_abs_avg=26.511091232299805
production_forward grad[59] vs paper_forward: mean_abs=0.45397472381591797, max_abs=1.96875, mean_rel=0.09379662573337555, max_rel=7.962927341461182, norm_rel=0.021829988807439804, ref_abs_avg=20.95742416381836, test_abs_avg=20.955810546875
production_forward grad[60] vs paper_forward: mean_abs=0.6135246157646179, max_abs=5.0, mean_rel=0.15376362204551697, max_rel=905.484375, norm_rel=0.024046765640378, ref_abs_avg=25.574947357177734, test_abs_avg=25.575132369995117
production_forward grad[61] vs paper_forward: mean_abs=0.571082353591919, max_abs=3.875, mean_rel=0.2394377887248993, max_rel=2031.2498779296875, norm_rel=0.022706272080540657, ref_abs_avg=25.189273834228516, test_abs_avg=25.19271469116211
production_forward grad[62] vs paper_forward: mean_abs=0.4813356399536133, max_abs=1.810546875, mean_rel=0.08132763206958771, max_rel=1.966794729232788, norm_rel=0.023881062865257263, ref_abs_avg=20.03789710998535, test_abs_avg=19.990047454833984
production_forward grad[63] vs paper_forward: mean_abs=0.5828136205673218, max_abs=4.5, mean_rel=0.15310761332511902, max_rel=881.3025512695312, norm_rel=0.023991694673895836, ref_abs_avg=24.356826782226562, test_abs_avg=24.3568058013916
production_forward grad[64] vs paper_forward: mean_abs=0.541096568107605, max_abs=3.640625, mean_rel=0.21987155079841614, max_rel=1406.2498779296875, norm_rel=0.022236965596675873, ref_abs_avg=24.29850959777832, test_abs_avg=24.285659790039062
production_forward grad[65] vs paper_forward: mean_abs=0.4189877510070801, max_abs=1.796875, mean_rel=0.1311453878879547, max_rel=15.116660118103027, norm_rel=0.02070089429616928, ref_abs_avg=20.149154663085938, test_abs_avg=20.142873764038086
production_forward grad[66] vs paper_forward: mean_abs=0.5488709211349487, max_abs=4.0, mean_rel=0.14412543177604675, max_rel=888.4699096679688, norm_rel=0.02334894798696041, ref_abs_avg=23.551830291748047, test_abs_avg=23.553260803222656
production_forward grad[67] vs paper_forward: mean_abs=0.5078493356704712, max_abs=3.625, mean_rel=0.20572373270988464, max_rel=1718.7498779296875, norm_rel=0.02204020693898201, ref_abs_avg=23.062036514282227, test_abs_avg=23.055347442626953
production_forward grad[68] vs paper_forward: mean_abs=0.3983802795410156, max_abs=1.5, mean_rel=0.0849686786532402, max_rel=5.258861064910889, norm_rel=0.02145632542669773, ref_abs_avg=19.03464126586914, test_abs_avg=19.00490951538086
production_forward grad[69] vs paper_forward: mean_abs=0.5250870585441589, max_abs=5.0, mean_rel=0.14491060376167297, max_rel=885.2892456054688, norm_rel=0.023185377940535545, ref_abs_avg=22.655019760131836, test_abs_avg=22.6541748046875
production_forward grad[70] vs paper_forward: mean_abs=0.4826560616493225, max_abs=3.4375, mean_rel=0.2104124128818512, max_rel=1749.9998779296875, norm_rel=0.0212597344070673, ref_abs_avg=22.674694061279297, test_abs_avg=22.67238998413086
production_forward grad[71] vs paper_forward: mean_abs=0.37903594970703125, max_abs=1.625, mean_rel=0.06622505187988281, max_rel=2.6126809120178223, norm_rel=0.019725989550352097, ref_abs_avg=19.406997680664062, test_abs_avg=19.374704360961914
production_forward grad[72] vs paper_forward: mean_abs=0.5010420083999634, max_abs=4.0, mean_rel=0.14193086326122284, max_rel=955.0009765625, norm_rel=0.022592177614569664, ref_abs_avg=22.227554321289062, test_abs_avg=22.225570678710938
production_forward grad[73] vs paper_forward: mean_abs=0.4574907720088959, max_abs=3.5, mean_rel=0.22336798906326294, max_rel=2953.124755859375, norm_rel=0.021044427528977394, ref_abs_avg=21.710683822631836, test_abs_avg=21.708148956298828
production_forward grad[74] vs paper_forward: mean_abs=0.42830610275268555, max_abs=1.5, mean_rel=0.08709719032049179, max_rel=9.282129287719727, norm_rel=0.02099594846367836, ref_abs_avg=20.071014404296875, test_abs_avg=20.09124755859375
production_forward grad[75] vs paper_forward: mean_abs=0.55229252576828, max_abs=5.0, mean_rel=0.14493289589881897, max_rel=932.7471313476562, norm_rel=0.023905323818325996, ref_abs_avg=23.182941436767578, test_abs_avg=23.18185806274414
production_forward grad[76] vs paper_forward: mean_abs=0.514480471611023, max_abs=3.6875, mean_rel=0.2153533548116684, max_rel=1406.2498779296875, norm_rel=0.022298775613307953, ref_abs_avg=23.14560317993164, test_abs_avg=23.14116096496582
production_forward grad[77] vs paper_forward: mean_abs=0.43393588066101074, max_abs=1.5, mean_rel=0.26674944162368774, max_rel=84.2615966796875, norm_rel=0.02308546006679535, ref_abs_avg=18.440889358520508, test_abs_avg=18.410045623779297
production_forward grad[78] vs paper_forward: mean_abs=0.5161359310150146, max_abs=4.5, mean_rel=0.14803683757781982, max_rel=919.049072265625, norm_rel=0.02343081310391426, ref_abs_avg=22.082088470458984, test_abs_avg=22.082706451416016
production_forward grad[79] vs paper_forward: mean_abs=0.47299909591674805, max_abs=3.5, mean_rel=0.19502457976341248, max_rel=1406.2498779296875, norm_rel=0.02205968275666237, ref_abs_avg=21.508865356445312, test_abs_avg=21.509166717529297
production_forward grad[80] vs paper_forward: mean_abs=0.36762094497680664, max_abs=1.5, mean_rel=0.08740536868572235, max_rel=9.375608444213867, norm_rel=0.022170038893818855, ref_abs_avg=16.764415740966797, test_abs_avg=16.780704498291016
production_forward grad[81] vs paper_forward: mean_abs=0.4741615056991577, max_abs=5.0, mean_rel=0.14345607161521912, max_rel=738.9680786132812, norm_rel=0.022899530827999115, ref_abs_avg=20.74722671508789, test_abs_avg=20.747314453125
production_forward grad[82] vs paper_forward: mean_abs=0.4347360134124756, max_abs=3.625, mean_rel=0.2100261151790619, max_rel=1499.9998779296875, norm_rel=0.02110302820801735, ref_abs_avg=20.62359046936035, test_abs_avg=20.632047653198242
production_forward grad[83] vs paper_forward: mean_abs=0.34834039211273193, max_abs=1.5, mean_rel=0.16009356081485748, max_rel=21.59204864501953, norm_rel=0.02047465555369854, ref_abs_avg=16.836017608642578, test_abs_avg=16.829429626464844
production_forward grad[84] vs paper_forward: mean_abs=0.4453967213630676, max_abs=5.0, mean_rel=0.13514816761016846, max_rel=1094.46630859375, norm_rel=0.022433750331401825, ref_abs_avg=19.935256958007812, test_abs_avg=19.93401336669922
production_forward grad[85] vs paper_forward: mean_abs=0.4050225615501404, max_abs=3.46875, mean_rel=0.20825523138046265, max_rel=1125.0, norm_rel=0.020743248984217644, ref_abs_avg=19.62401580810547, test_abs_avg=19.614974975585938
production_forward grad[86] vs paper_forward: mean_abs=0.33206993341445923, max_abs=1.25, mean_rel=0.2794174551963806, max_rel=109.55994415283203, norm_rel=0.0212470144033432, ref_abs_avg=15.685928344726562, test_abs_avg=15.70103645324707
production_forward grad[87] vs paper_forward: mean_abs=0.41600143909454346, max_abs=4.0, mean_rel=0.13582076132297516, max_rel=834.6943969726562, norm_rel=0.021648865193128586, ref_abs_avg=19.35485076904297, test_abs_avg=19.354522705078125
production_forward grad[88] vs paper_forward: mean_abs=0.38097044825553894, max_abs=3.5625, mean_rel=0.16445371508598328, max_rel=1109.375, norm_rel=0.01938813365995884, ref_abs_avg=19.675987243652344, test_abs_avg=19.674938201904297
production_forward grad[89] vs paper_forward: mean_abs=0.3102610111236572, max_abs=1.625, mean_rel=0.23847411572933197, max_rel=64.7461929321289, norm_rel=0.019590437412261963, ref_abs_avg=15.819189071655273, test_abs_avg=15.846271514892578
production_forward grad[90] vs paper_forward: mean_abs=0.40484344959259033, max_abs=4.5, mean_rel=0.1310211569070816, max_rel=609.8453979492188, norm_rel=0.02145964838564396, ref_abs_avg=19.026586532592773, test_abs_avg=19.025909423828125
production_forward grad[91] vs paper_forward: mean_abs=0.3615040183067322, max_abs=4.25, mean_rel=0.15096093714237213, max_rel=874.9999389648438, norm_rel=0.019761638715863228, ref_abs_avg=18.506004333496094, test_abs_avg=18.51047134399414
production_forward grad[92] vs paper_forward: mean_abs=0.29337453842163086, max_abs=1.0625, mean_rel=0.05881282314658165, max_rel=3.3062562942504883, norm_rel=0.01832285150885582, ref_abs_avg=16.213098526000977, test_abs_avg=16.18859100341797
production_forward grad[93] vs paper_forward: mean_abs=0.37650996446609497, max_abs=4.0, mean_rel=0.1250913143157959, max_rel=549.5801391601562, norm_rel=0.02085096761584282, ref_abs_avg=18.272499084472656, test_abs_avg=18.272598266601562
production_forward grad[94] vs paper_forward: mean_abs=0.33221861720085144, max_abs=3.0, mean_rel=0.1547526866197586, max_rel=1328.1248779296875, norm_rel=0.018916215747594833, ref_abs_avg=17.811420440673828, test_abs_avg=17.821529388427734
production_forward grad[95] vs paper_forward: mean_abs=0.2953453063964844, max_abs=1.125, mean_rel=0.05940317362546921, max_rel=2.12803316116333, norm_rel=0.019109800457954407, ref_abs_avg=15.829227447509766, test_abs_avg=15.832965850830078
production_forward grad[96] vs paper_forward: mean_abs=0.36464250087738037, max_abs=5.5, mean_rel=0.139136403799057, max_rel=1013.2445068359375, norm_rel=0.02069116197526455, ref_abs_avg=17.929426193237305, test_abs_avg=17.93023109436035
production_forward grad[97] vs paper_forward: mean_abs=0.3205823302268982, max_abs=3.5, mean_rel=0.18576085567474365, max_rel=1312.4998779296875, norm_rel=0.01929895021021366, ref_abs_avg=16.957149505615234, test_abs_avg=16.95671844482422
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016097412444651127, max_abs=0.041015625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008333787322044373, max_abs=0.375, mean_rel=0.07342377305030823, max_rel=138.91192626953125, norm_rel=0.020013555884361267, ref_abs_avg=0.4474659860134125, test_abs_avg=0.44746333360671997
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.043440341949463, max_abs=52.0, mean_rel=0.286907821893692, max_rel=1683.029541015625, norm_rel=0.0198199599981308, ref_abs_avg=312.1807556152344, test_abs_avg=312.16876220703125
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2545671463012695, max_abs=4.046875, mean_rel=0.156778484582901, max_rel=40.278499603271484, norm_rel=0.024938397109508514, ref_abs_avg=49.95672607421875, test_abs_avg=49.96107864379883
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5806041955947876, max_abs=12.0, mean_rel=0.17707665264606476, max_rel=2066.505859375, norm_rel=0.024849029257893562, ref_abs_avg=63.98973083496094, test_abs_avg=63.992652893066406
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.4505736827850342, max_abs=10.5, mean_rel=0.355327844619751, max_rel=3781.249755859375, norm_rel=0.022999323904514313, ref_abs_avg=63.489173889160156, test_abs_avg=63.48729705810547
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.057326316833496, max_abs=4.0, mean_rel=0.10689444839954376, max_rel=7.982998847961426, norm_rel=0.0227633249014616, ref_abs_avg=45.798519134521484, test_abs_avg=45.75641632080078
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3688280582427979, max_abs=10.0, mean_rel=0.1743684709072113, max_rel=1695.213134765625, norm_rel=0.024593139067292213, ref_abs_avg=56.01063537597656, test_abs_avg=56.0081901550293
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.2610782384872437, max_abs=8.25, mean_rel=0.3933274745941162, max_rel=4750.0, norm_rel=0.02284749038517475, ref_abs_avg=55.39360809326172, test_abs_avg=55.39888000488281
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.997962474822998, max_abs=4.75, mean_rel=0.17762494087219238, max_rel=32.5611686706543, norm_rel=0.02379590831696987, ref_abs_avg=42.1494026184082, test_abs_avg=42.19715118408203
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.2476515769958496, max_abs=8.0, mean_rel=0.1705923080444336, max_rel=1724.3919677734375, norm_rel=0.02438313513994217, ref_abs_avg=51.49037170410156, test_abs_avg=51.491207122802734
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1457977294921875, max_abs=7.5, mean_rel=0.32014545798301697, max_rel=4562.5, norm_rel=0.022695261985063553, ref_abs_avg=50.795597076416016, test_abs_avg=50.80470275878906
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.8724991083145142, max_abs=3.5, mean_rel=0.21976114809513092, max_rel=58.28168869018555, norm_rel=0.02384158968925476, ref_abs_avg=37.748252868652344, test_abs_avg=37.70714569091797
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.158202886581421, max_abs=8.0, mean_rel=0.15664084255695343, max_rel=1094.71630859375, norm_rel=0.024132607504725456, ref_abs_avg=48.29217529296875, test_abs_avg=48.293739318847656
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0618350505828857, max_abs=6.25, mean_rel=0.35293614864349365, max_rel=3937.499755859375, norm_rel=0.022470204159617424, ref_abs_avg=47.478492736816406, test_abs_avg=47.48478698730469
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8360735177993774, max_abs=3.625, mean_rel=0.07183573395013809, max_rel=4.339757442474365, norm_rel=0.022512687370181084, ref_abs_avg=37.344818115234375, test_abs_avg=37.238243103027344
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.07766592502594, max_abs=8.5, mean_rel=0.15821360051631927, max_rel=2464.556884765625, norm_rel=0.02394738793373108, ref_abs_avg=45.29884338378906, test_abs_avg=45.29985046386719
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.9887627363204956, max_abs=6.25, mean_rel=0.3500524163246155, max_rel=2999.999755859375, norm_rel=0.022301962599158287, ref_abs_avg=44.584564208984375, test_abs_avg=44.58503723144531
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.7852494716644287, max_abs=3.25, mean_rel=0.173049196600914, max_rel=38.24061584472656, norm_rel=0.023730043321847916, ref_abs_avg=33.45771789550781, test_abs_avg=33.487762451171875
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0055527687072754, max_abs=6.5, mean_rel=0.15214912593364716, max_rel=985.7774047851562, norm_rel=0.023750854656100273, ref_abs_avg=42.624534606933594, test_abs_avg=42.62071228027344
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9214333295822144, max_abs=5.8125, mean_rel=0.3334996998310089, max_rel=2906.249755859375, norm_rel=0.022095633670687675, ref_abs_avg=41.87931823730469, test_abs_avg=41.87958526611328
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7180960178375244, max_abs=2.875, mean_rel=0.18162578344345093, max_rel=43.36837387084961, norm_rel=0.021590013056993484, ref_abs_avg=33.931922912597656, test_abs_avg=33.89765930175781
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.9501988887786865, max_abs=8.0, mean_rel=0.15136748552322388, max_rel=1860.033447265625, norm_rel=0.023635977879166603, ref_abs_avg=40.470191955566406, test_abs_avg=40.46730041503906
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8758261799812317, max_abs=6.0, mean_rel=0.2866027057170868, max_rel=3249.999755859375, norm_rel=0.022043684497475624, ref_abs_avg=39.88436508178711, test_abs_avg=39.88768005371094
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.6540923118591309, max_abs=2.5, mean_rel=0.11935117840766907, max_rel=26.982847213745117, norm_rel=0.020375672727823257, ref_abs_avg=31.831401824951172, test_abs_avg=31.830774307250977
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.906840980052948, max_abs=6.0, mean_rel=0.15632511675357819, max_rel=1194.174560546875, norm_rel=0.023378951475024223, ref_abs_avg=39.002845764160156, test_abs_avg=39.00220489501953
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.831322431564331, max_abs=5.0, mean_rel=0.2956881523132324, max_rel=2312.5, norm_rel=0.021615399047732353, ref_abs_avg=38.643280029296875, test_abs_avg=38.640892028808594
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8125886917114258, max_abs=3.625, mean_rel=0.10540984570980072, max_rel=8.562032699584961, norm_rel=0.025153661146759987, ref_abs_avg=32.38740158081055, test_abs_avg=32.36772155761719
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0331048965454102, max_abs=7.0, mean_rel=0.1876049041748047, max_rel=2949.477783203125, norm_rel=0.025266341865062714, ref_abs_avg=41.104793548583984, test_abs_avg=41.10930252075195
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9588702917098999, max_abs=6.0, mean_rel=0.30399197340011597, max_rel=2937.499755859375, norm_rel=0.023606199771165848, ref_abs_avg=40.81169891357422, test_abs_avg=40.807777404785156
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.821664571762085, max_abs=3.125, mean_rel=0.21919436752796173, max_rel=48.432281494140625, norm_rel=0.026219191029667854, ref_abs_avg=30.43256378173828, test_abs_avg=30.415645599365234
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9619618654251099, max_abs=7.0, mean_rel=0.17124326527118683, max_rel=1775.99609375, norm_rel=0.025608345866203308, ref_abs_avg=37.733734130859375, test_abs_avg=37.73514175415039
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9036403894424438, max_abs=6.046875, mean_rel=0.31296566128730774, max_rel=2437.5, norm_rel=0.02435031719505787, ref_abs_avg=37.25914001464844, test_abs_avg=37.259498596191406
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7150219678878784, max_abs=3.25, mean_rel=0.09044529497623444, max_rel=6.325164318084717, norm_rel=0.024366814643144608, ref_abs_avg=29.326904296875, test_abs_avg=29.375621795654297
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9035990238189697, max_abs=6.0, mean_rel=0.16131925582885742, max_rel=2309.85595703125, norm_rel=0.02556246519088745, ref_abs_avg=35.517311096191406, test_abs_avg=35.51378631591797
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.840349555015564, max_abs=5.25, mean_rel=0.2991539537906647, max_rel=2250.0, norm_rel=0.024046463891863823, ref_abs_avg=35.048824310302734, test_abs_avg=35.04481887817383
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6619682312011719, max_abs=2.625, mean_rel=0.08532222360372543, max_rel=5.732023239135742, norm_rel=0.02447483129799366, ref_abs_avg=27.649303436279297, test_abs_avg=27.648929595947266
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.84710693359375, max_abs=7.0, mean_rel=0.16521745920181274, max_rel=1150.5477294921875, norm_rel=0.02538921684026718, ref_abs_avg=33.53178024291992, test_abs_avg=33.53007507324219
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.791228175163269, max_abs=4.75, mean_rel=0.27002185583114624, max_rel=2812.499755859375, norm_rel=0.02388891763985157, ref_abs_avg=33.286224365234375, test_abs_avg=33.28997039794922
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6490268707275391, max_abs=2.5, mean_rel=0.18899424374103546, max_rel=37.74767303466797, norm_rel=0.02548857405781746, ref_abs_avg=25.921640396118164, test_abs_avg=25.92083740234375
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8022886514663696, max_abs=6.0, mean_rel=0.15460804104804993, max_rel=1264.0103759765625, norm_rel=0.025082089006900787, ref_abs_avg=32.10887908935547, test_abs_avg=32.11021423339844
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7440874576568604, max_abs=4.5, mean_rel=0.2885676622390747, max_rel=1999.9998779296875, norm_rel=0.023504391312599182, ref_abs_avg=31.746898651123047, test_abs_avg=31.75044059753418
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6015068292617798, max_abs=2.5, mean_rel=0.09416038542985916, max_rel=19.050535202026367, norm_rel=0.02446141093969345, ref_abs_avg=25.104482650756836, test_abs_avg=25.17319107055664
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.763860821723938, max_abs=6.0, mean_rel=0.16427971422672272, max_rel=1151.2989501953125, norm_rel=0.024765878915786743, ref_abs_avg=30.987443923950195, test_abs_avg=30.987268447875977
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7115097045898438, max_abs=4.5, mean_rel=0.27507057785987854, max_rel=2500.0, norm_rel=0.02322113700211048, ref_abs_avg=30.734745025634766, test_abs_avg=30.7301025390625
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5611416101455688, max_abs=2.75, mean_rel=0.05979456007480621, max_rel=4.53734827041626, norm_rel=0.022961976006627083, ref_abs_avg=25.52114486694336, test_abs_avg=25.560104370117188
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7247729897499084, max_abs=5.5, mean_rel=0.15606689453125, max_rel=778.771240234375, norm_rel=0.024444198235869408, ref_abs_avg=29.73104476928711, test_abs_avg=29.731433868408203
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6752780079841614, max_abs=4.0, mean_rel=0.2756226658821106, max_rel=2375.0, norm_rel=0.023031553253531456, ref_abs_avg=29.360727310180664, test_abs_avg=29.36213493347168
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.557916522026062, max_abs=2.25, mean_rel=0.34104400873184204, max_rel=67.4984130859375, norm_rel=0.02340761013329029, ref_abs_avg=23.63433074951172, test_abs_avg=23.63056755065918
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6899218559265137, max_abs=5.0, mean_rel=0.15726789832115173, max_rel=737.4136352539062, norm_rel=0.02426508627831936, ref_abs_avg=28.512149810791016, test_abs_avg=28.512248992919922
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6417238712310791, max_abs=4.25, mean_rel=0.2580905258655548, max_rel=2046.8748779296875, norm_rel=0.02272140607237816, ref_abs_avg=28.31332778930664, test_abs_avg=28.309738159179688
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5812673568725586, max_abs=2.125, mean_rel=0.4136595129966736, max_rel=150.94915771484375, norm_rel=0.024519795551896095, ref_abs_avg=23.950767517089844, test_abs_avg=24.03545379638672
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7730911374092102, max_abs=6.0, mean_rel=0.16078850626945496, max_rel=1105.9990234375, norm_rel=0.02572043612599373, ref_abs_avg=30.183094024658203, test_abs_avg=30.18270492553711
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7167153358459473, max_abs=4.75, mean_rel=0.2629444897174835, max_rel=2562.5, norm_rel=0.024340469390153885, ref_abs_avg=29.565296173095703, test_abs_avg=29.568401336669922
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.553614616394043, max_abs=2.46875, mean_rel=0.28024184703826904, max_rel=67.78485107421875, norm_rel=0.024557175114750862, ref_abs_avg=22.59943962097168, test_abs_avg=22.547710418701172
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7040528059005737, max_abs=5.5, mean_rel=0.15909647941589355, max_rel=920.3184814453125, norm_rel=0.025540350005030632, ref_abs_avg=27.63666534423828, test_abs_avg=27.638362884521484
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.656879186630249, max_abs=4.25, mean_rel=0.26987671852111816, max_rel=2250.0, norm_rel=0.023684779182076454, ref_abs_avg=27.71337890625, test_abs_avg=27.703502655029297
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5044265389442444, max_abs=2.046875, mean_rel=0.3752845227718353, max_rel=136.16024780273438, norm_rel=0.0226113423705101, ref_abs_avg=22.06719207763672, test_abs_avg=22.055652618408203
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6624940633773804, max_abs=4.625, mean_rel=0.15534478425979614, max_rel=903.1912231445312, norm_rel=0.024927904829382896, ref_abs_avg=26.63531494140625, test_abs_avg=26.63353157043457
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6134214401245117, max_abs=4.0, mean_rel=0.26677778363227844, max_rel=1968.7498779296875, norm_rel=0.0231387410312891, ref_abs_avg=26.509746551513672, test_abs_avg=26.513134002685547
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.44986724853515625, max_abs=1.75, mean_rel=0.09429852664470673, max_rel=8.825078964233398, norm_rel=0.021819839254021645, ref_abs_avg=20.95742416381836, test_abs_avg=20.949310302734375
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6206517219543457, max_abs=5.0, mean_rel=0.15770575404167175, max_rel=724.8175048828125, norm_rel=0.024324163794517517, ref_abs_avg=25.574947357177734, test_abs_avg=25.575912475585938
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.581067681312561, max_abs=3.6875, mean_rel=0.2553609609603882, max_rel=2250.0, norm_rel=0.023112356662750244, ref_abs_avg=25.189273834228516, test_abs_avg=25.193607330322266
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4736182689666748, max_abs=1.875, mean_rel=0.08060665428638458, max_rel=2.2032294273376465, norm_rel=0.02356041595339775, ref_abs_avg=20.03789710998535, test_abs_avg=20.02005386352539
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5887149572372437, max_abs=4.5, mean_rel=0.15750887989997864, max_rel=1051.8817138671875, norm_rel=0.024231260642409325, ref_abs_avg=24.356826782226562, test_abs_avg=24.357328414916992
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5488632917404175, max_abs=3.65625, mean_rel=0.2283330261707306, max_rel=1484.3748779296875, norm_rel=0.022574393078684807, ref_abs_avg=24.29850959777832, test_abs_avg=24.287246704101562
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4308948516845703, max_abs=1.671875, mean_rel=0.12638318538665771, max_rel=9.233419418334961, norm_rel=0.02129298262298107, ref_abs_avg=20.149154663085938, test_abs_avg=20.16535186767578
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5544227361679077, max_abs=4.5, mean_rel=0.14701387286186218, max_rel=1377.2315673828125, norm_rel=0.023583147674798965, ref_abs_avg=23.551830291748047, test_abs_avg=23.552310943603516
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5148879885673523, max_abs=3.375, mean_rel=0.20041552186012268, max_rel=1781.2498779296875, norm_rel=0.02231837995350361, ref_abs_avg=23.062036514282227, test_abs_avg=23.055274963378906
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4266376495361328, max_abs=1.625, mean_rel=0.08256489038467407, max_rel=3.6243603229522705, norm_rel=0.022541403770446777, ref_abs_avg=19.03464126586914, test_abs_avg=19.015159606933594
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5297096967697144, max_abs=5.0, mean_rel=0.1493266373872757, max_rel=868.5174560546875, norm_rel=0.02338782325387001, ref_abs_avg=22.655019760131836, test_abs_avg=22.653690338134766
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.48939967155456543, max_abs=3.5, mean_rel=0.2075527310371399, max_rel=1828.1248779296875, norm_rel=0.021564284339547157, ref_abs_avg=22.674694061279297, test_abs_avg=22.671863555908203
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.38553524017333984, max_abs=1.375, mean_rel=0.07553350180387497, max_rel=6.233535289764404, norm_rel=0.019948648288846016, ref_abs_avg=19.406997680664062, test_abs_avg=19.361976623535156
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5049866437911987, max_abs=6.0, mean_rel=0.1385984569787979, max_rel=541.4483642578125, norm_rel=0.022766755893826485, ref_abs_avg=22.227554321289062, test_abs_avg=22.226219177246094
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4634213447570801, max_abs=3.5, mean_rel=0.22930681705474854, max_rel=2406.25, norm_rel=0.021301474422216415, ref_abs_avg=21.710683822631836, test_abs_avg=21.70675277709961
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.42086708545684814, max_abs=1.75, mean_rel=0.08816847205162048, max_rel=6.53767728805542, norm_rel=0.02057032473385334, ref_abs_avg=20.071014404296875, test_abs_avg=20.08128547668457
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5594033598899841, max_abs=5.5, mean_rel=0.15147991478443146, max_rel=1366.33837890625, norm_rel=0.02419539913535118, ref_abs_avg=23.182941436767578, test_abs_avg=23.181262969970703
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5201194286346436, max_abs=4.0, mean_rel=0.21863675117492676, max_rel=2062.5, norm_rel=0.022512001916766167, ref_abs_avg=23.14560317993164, test_abs_avg=23.141355514526367
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.42575955390930176, max_abs=1.5, mean_rel=0.3140457570552826, max_rel=101.6358413696289, norm_rel=0.022631507366895676, ref_abs_avg=18.440889358520508, test_abs_avg=18.426401138305664
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5214847922325134, max_abs=5.0, mean_rel=0.14860309660434723, max_rel=1063.0599365234375, norm_rel=0.02364465221762657, ref_abs_avg=22.082088470458984, test_abs_avg=22.082937240600586
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4785674214363098, max_abs=3.5, mean_rel=0.19430850446224213, max_rel=1218.75, norm_rel=0.02232605591416359, ref_abs_avg=21.508865356445312, test_abs_avg=21.508848190307617
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3673359751701355, max_abs=1.75, mean_rel=0.11986357718706131, max_rel=20.93480110168457, norm_rel=0.022286534309387207, ref_abs_avg=16.764415740966797, test_abs_avg=16.777931213378906
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.47843673825263977, max_abs=4.71875, mean_rel=0.14383700489997864, max_rel=766.3828125, norm_rel=0.023103883489966393, ref_abs_avg=20.74722671508789, test_abs_avg=20.747577667236328
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4377743899822235, max_abs=3.5, mean_rel=0.20050999522209167, max_rel=1531.2498779296875, norm_rel=0.021253475919365883, ref_abs_avg=20.62359046936035, test_abs_avg=20.629108428955078
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.35658299922943115, max_abs=1.25, mean_rel=0.09240970760583878, max_rel=10.019988059997559, norm_rel=0.021125534549355507, ref_abs_avg=16.836017608642578, test_abs_avg=16.836999893188477
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4497660994529724, max_abs=5.0, mean_rel=0.13424727320671082, max_rel=963.0944213867188, norm_rel=0.022644631564617157, ref_abs_avg=19.935256958007812, test_abs_avg=19.9339599609375
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.41056323051452637, max_abs=3.5625, mean_rel=0.20910406112670898, max_rel=1125.0, norm_rel=0.02102339267730713, ref_abs_avg=19.62401580810547, test_abs_avg=19.617877960205078
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.33051973581314087, max_abs=1.375, mean_rel=0.532930850982666, max_rel=239.1827392578125, norm_rel=0.021412009373307228, ref_abs_avg=15.685928344726562, test_abs_avg=15.700027465820312
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4191094636917114, max_abs=4.0, mean_rel=0.13554337620735168, max_rel=1221.2081298828125, norm_rel=0.0217878557741642, ref_abs_avg=19.35485076904297, test_abs_avg=19.354957580566406
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3821004033088684, max_abs=3.0, mean_rel=0.16916291415691376, max_rel=1062.5, norm_rel=0.019453518092632294, ref_abs_avg=19.675987243652344, test_abs_avg=19.67529296875
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.31589996814727783, max_abs=1.515625, mean_rel=0.21015839278697968, max_rel=48.685123443603516, norm_rel=0.020129859447479248, ref_abs_avg=15.819189071655273, test_abs_avg=15.835752487182617
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.407084584236145, max_abs=5.0, mean_rel=0.13185781240463257, max_rel=701.0369873046875, norm_rel=0.02155461721122265, ref_abs_avg=19.026586532592773, test_abs_avg=19.026630401611328
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3647764027118683, max_abs=4.5, mean_rel=0.15805360674858093, max_rel=906.2499389648438, norm_rel=0.019941752776503563, ref_abs_avg=18.506004333496094, test_abs_avg=18.51070785522461
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.2973141670227051, max_abs=1.25, mean_rel=0.056468695402145386, max_rel=2.8658294677734375, norm_rel=0.018350014463067055, ref_abs_avg=16.213098526000977, test_abs_avg=16.204227447509766
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.37859269976615906, max_abs=5.0, mean_rel=0.12596824765205383, max_rel=466.8551940917969, norm_rel=0.020960185676813126, ref_abs_avg=18.272499084472656, test_abs_avg=18.272380828857422
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3332482576370239, max_abs=3.0, mean_rel=0.15840190649032593, max_rel=1070.3125, norm_rel=0.018975814804434776, ref_abs_avg=17.811420440673828, test_abs_avg=17.822551727294922
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2849149703979492, max_abs=1.0, mean_rel=0.05518728494644165, max_rel=2.0570147037506104, norm_rel=0.018345633521676064, ref_abs_avg=15.829227447509766, test_abs_avg=15.841519355773926
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3658621311187744, max_abs=5.5, mean_rel=0.13755667209625244, max_rel=949.6578369140625, norm_rel=0.020751170814037323, ref_abs_avg=17.929426193237305, test_abs_avg=17.9305419921875
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.32071349024772644, max_abs=3.25, mean_rel=0.17504814267158508, max_rel=1523.4373779296875, norm_rel=0.01925339736044407, ref_abs_avg=16.957149505615234, test_abs_avg=16.952899932861328
production_forward2 vs paper_forward output: mean_abs=0.0016056876629590988, max_abs=0.041015625
production_forward2 grad[0] vs paper_forward: mean_abs=0.007991256192326546, max_abs=0.359375, mean_rel=0.07077689468860626, max_rel=109.62318420410156, norm_rel=0.019296979531645775, ref_abs_avg=0.4474659860134125, test_abs_avg=0.447476327419281
production_forward2 grad[1] vs paper_forward: mean_abs=6.8500447273254395, max_abs=64.0, mean_rel=0.3895456790924072, max_rel=2559.982177734375, norm_rel=0.01931956596672535, ref_abs_avg=312.1807556152344, test_abs_avg=312.15899658203125
production_forward2 grad[2] vs paper_forward: mean_abs=1.235703468322754, max_abs=4.5, mean_rel=0.13732172548770905, max_rel=29.110074996948242, norm_rel=0.02431359328329563, ref_abs_avg=49.95672607421875, test_abs_avg=49.97590637207031
production_forward2 grad[3] vs paper_forward: mean_abs=1.5318742990493774, max_abs=11.0, mean_rel=0.17594246566295624, max_rel=2746.7255859375, norm_rel=0.02409060113132, ref_abs_avg=63.98973083496094, test_abs_avg=63.99421691894531
production_forward2 grad[4] vs paper_forward: mean_abs=1.4005672931671143, max_abs=9.5, mean_rel=0.32066619396209717, max_rel=3593.749755859375, norm_rel=0.022210262715816498, ref_abs_avg=63.489173889160156, test_abs_avg=63.50056457519531
production_forward2 grad[5] vs paper_forward: mean_abs=1.0103120803833008, max_abs=4.25, mean_rel=0.10353896021842957, max_rel=6.576303005218506, norm_rel=0.022272279486060143, ref_abs_avg=45.798519134521484, test_abs_avg=45.781532287597656
production_forward2 grad[6] vs paper_forward: mean_abs=1.3250703811645508, max_abs=9.0, mean_rel=0.16341622173786163, max_rel=1472.4808349609375, norm_rel=0.02382754161953926, ref_abs_avg=56.01063537597656, test_abs_avg=56.012939453125
production_forward2 grad[7] vs paper_forward: mean_abs=1.21257746219635, max_abs=8.0, mean_rel=0.42159172892570496, max_rel=3624.999755859375, norm_rel=0.022011814638972282, ref_abs_avg=55.39360809326172, test_abs_avg=55.402740478515625
production_forward2 grad[8] vs paper_forward: mean_abs=0.9453344345092773, max_abs=4.375, mean_rel=0.14334197342395782, max_rel=14.389866828918457, norm_rel=0.022794000804424286, ref_abs_avg=42.1494026184082, test_abs_avg=42.19422149658203
production_forward2 grad[9] vs paper_forward: mean_abs=1.2091327905654907, max_abs=9.0, mean_rel=0.16369768977165222, max_rel=1352.346435546875, norm_rel=0.02364920638501644, ref_abs_avg=51.49037170410156, test_abs_avg=51.49275207519531
production_forward2 grad[10] vs paper_forward: mean_abs=1.1024250984191895, max_abs=6.25, mean_rel=0.3344132900238037, max_rel=2999.999755859375, norm_rel=0.0218329019844532, ref_abs_avg=50.795597076416016, test_abs_avg=50.805931091308594
production_forward2 grad[11] vs paper_forward: mean_abs=0.8640033006668091, max_abs=3.5, mean_rel=0.0978500247001648, max_rel=18.136198043823242, norm_rel=0.023579565808176994, ref_abs_avg=37.748252868652344, test_abs_avg=37.7128791809082
production_forward2 grad[12] vs paper_forward: mean_abs=1.1248600482940674, max_abs=8.0, mean_rel=0.15037104487419128, max_rel=1119.7464599609375, norm_rel=0.02345442585647106, ref_abs_avg=48.29217529296875, test_abs_avg=48.29603958129883
production_forward2 grad[13] vs paper_forward: mean_abs=1.0280351638793945, max_abs=6.5, mean_rel=0.38039490580558777, max_rel=4187.5, norm_rel=0.021755464375019073, ref_abs_avg=47.478492736816406, test_abs_avg=47.48486328125
production_forward2 grad[14] vs paper_forward: mean_abs=0.8047409057617188, max_abs=3.125, mean_rel=0.07197169959545135, max_rel=3.233205795288086, norm_rel=0.021641507744789124, ref_abs_avg=37.344818115234375, test_abs_avg=37.26249694824219
production_forward2 grad[15] vs paper_forward: mean_abs=1.0486541986465454, max_abs=8.0, mean_rel=0.15828156471252441, max_rel=2411.558837890625, norm_rel=0.023286757990717888, ref_abs_avg=45.29884338378906, test_abs_avg=45.30219268798828
production_forward2 grad[16] vs paper_forward: mean_abs=0.9588913917541504, max_abs=5.5, mean_rel=0.33799612522125244, max_rel=2999.999755859375, norm_rel=0.02164231240749359, ref_abs_avg=44.584564208984375, test_abs_avg=44.587135314941406
production_forward2 grad[17] vs paper_forward: mean_abs=0.7459619045257568, max_abs=3.25, mean_rel=0.11676369607448578, max_rel=7.830923557281494, norm_rel=0.02249804511666298, ref_abs_avg=33.45771789550781, test_abs_avg=33.44586181640625
production_forward2 grad[18] vs paper_forward: mean_abs=0.981658935546875, max_abs=7.0, mean_rel=0.1546257734298706, max_rel=1455.14013671875, norm_rel=0.023185906931757927, ref_abs_avg=42.624534606933594, test_abs_avg=42.62331771850586
production_forward2 grad[19] vs paper_forward: mean_abs=0.8985117077827454, max_abs=5.5, mean_rel=0.3250548243522644, max_rel=2968.749755859375, norm_rel=0.02155117318034172, ref_abs_avg=41.87931823730469, test_abs_avg=41.88311767578125
production_forward2 grad[20] vs paper_forward: mean_abs=0.6998763084411621, max_abs=3.25, mean_rel=0.1827629804611206, max_rel=47.13004684448242, norm_rel=0.020960867404937744, ref_abs_avg=33.931922912597656, test_abs_avg=33.924556732177734
production_forward2 grad[21] vs paper_forward: mean_abs=0.928726077079773, max_abs=7.0, mean_rel=0.15008202195167542, max_rel=1698.62109375, norm_rel=0.023106655105948448, ref_abs_avg=40.470191955566406, test_abs_avg=40.46965026855469
production_forward2 grad[22] vs paper_forward: mean_abs=0.8503208160400391, max_abs=5.5, mean_rel=0.28791114687919617, max_rel=2406.25, norm_rel=0.0214138962328434, ref_abs_avg=39.88436508178711, test_abs_avg=39.88603973388672
production_forward2 grad[23] vs paper_forward: mean_abs=0.6506868004798889, max_abs=2.375, mean_rel=0.11132541298866272, max_rel=15.334622383117676, norm_rel=0.02008209004998207, ref_abs_avg=31.831401824951172, test_abs_avg=31.823501586914062
production_forward2 grad[24] vs paper_forward: mean_abs=0.8868436813354492, max_abs=6.0, mean_rel=0.1485045999288559, max_rel=791.5115966796875, norm_rel=0.022881099954247475, ref_abs_avg=39.002845764160156, test_abs_avg=39.00376510620117
production_forward2 grad[25] vs paper_forward: mean_abs=0.8100370168685913, max_abs=5.0, mean_rel=0.26967570185661316, max_rel=2375.0, norm_rel=0.021064789965748787, ref_abs_avg=38.643280029296875, test_abs_avg=38.6390380859375
production_forward2 grad[26] vs paper_forward: mean_abs=0.7708530426025391, max_abs=3.0, mean_rel=0.11338148266077042, max_rel=7.8039116859436035, norm_rel=0.02368774823844433, ref_abs_avg=32.38740158081055, test_abs_avg=32.406394958496094
production_forward2 grad[27] vs paper_forward: mean_abs=1.009087324142456, max_abs=7.25, mean_rel=0.18293915688991547, max_rel=2383.85107421875, norm_rel=0.024687504395842552, ref_abs_avg=41.104793548583984, test_abs_avg=41.11111068725586
production_forward2 grad[28] vs paper_forward: mean_abs=0.9321367740631104, max_abs=5.625, mean_rel=0.3023524880409241, max_rel=2874.999755859375, norm_rel=0.02294967696070671, ref_abs_avg=40.81169891357422, test_abs_avg=40.813011169433594
production_forward2 grad[29] vs paper_forward: mean_abs=0.7936923503875732, max_abs=3.0, mean_rel=0.17967557907104492, max_rel=21.877246856689453, norm_rel=0.025410234928131104, ref_abs_avg=30.43256378173828, test_abs_avg=30.42848014831543
production_forward2 grad[30] vs paper_forward: mean_abs=0.9417569637298584, max_abs=8.0, mean_rel=0.16370825469493866, max_rel=1721.6417236328125, norm_rel=0.025086838752031326, ref_abs_avg=37.733734130859375, test_abs_avg=37.73671340942383
production_forward2 grad[31] vs paper_forward: mean_abs=0.882937490940094, max_abs=6.0, mean_rel=0.3162538707256317, max_rel=2375.0, norm_rel=0.023811237886548042, ref_abs_avg=37.25914001464844, test_abs_avg=37.261985778808594
production_forward2 grad[32] vs paper_forward: mean_abs=0.7011356353759766, max_abs=2.75, mean_rel=0.08645782619714737, max_rel=5.15634298324585, norm_rel=0.024012627080082893, ref_abs_avg=29.326904296875, test_abs_avg=29.382822036743164
production_forward2 grad[33] vs paper_forward: mean_abs=0.883716344833374, max_abs=5.875, mean_rel=0.16014865040779114, max_rel=1711.7371826171875, norm_rel=0.02501426637172699, ref_abs_avg=35.517311096191406, test_abs_avg=35.51593017578125
production_forward2 grad[34] vs paper_forward: mean_abs=0.8211179971694946, max_abs=4.75, mean_rel=0.30419766902923584, max_rel=1999.9998779296875, norm_rel=0.023488448932766914, ref_abs_avg=35.048824310302734, test_abs_avg=35.04670715332031
production_forward2 grad[35] vs paper_forward: mean_abs=0.6689646244049072, max_abs=2.5625, mean_rel=0.1004219502210617, max_rel=9.445460319519043, norm_rel=0.024840619415044785, ref_abs_avg=27.649303436279297, test_abs_avg=27.65239715576172
production_forward2 grad[36] vs paper_forward: mean_abs=0.8320024609565735, max_abs=7.0, mean_rel=0.1629858911037445, max_rel=1670.055908203125, norm_rel=0.024959415197372437, ref_abs_avg=33.53178024291992, test_abs_avg=33.530174255371094
production_forward2 grad[37] vs paper_forward: mean_abs=0.7738316059112549, max_abs=5.0, mean_rel=0.26564204692840576, max_rel=2937.499755859375, norm_rel=0.023384492844343185, ref_abs_avg=33.286224365234375, test_abs_avg=33.29121398925781
production_forward2 grad[38] vs paper_forward: mean_abs=0.6071064472198486, max_abs=2.283203125, mean_rel=0.17059481143951416, max_rel=34.883026123046875, norm_rel=0.023547329008579254, ref_abs_avg=25.921640396118164, test_abs_avg=25.915712356567383
production_forward2 grad[39] vs paper_forward: mean_abs=0.7876396179199219, max_abs=5.0, mean_rel=0.15152449905872345, max_rel=914.9249267578125, norm_rel=0.024629246443510056, ref_abs_avg=32.10887908935547, test_abs_avg=32.11180877685547
production_forward2 grad[40] vs paper_forward: mean_abs=0.7278226613998413, max_abs=4.4375, mean_rel=0.27162835001945496, max_rel=2062.5, norm_rel=0.02301134541630745, ref_abs_avg=31.746898651123047, test_abs_avg=31.75060272216797
production_forward2 grad[41] vs paper_forward: mean_abs=0.5990256071090698, max_abs=2.375, mean_rel=0.08563782274723053, max_rel=15.701078414916992, norm_rel=0.024526741355657578, ref_abs_avg=25.104482650756836, test_abs_avg=25.19913673400879
production_forward2 grad[42] vs paper_forward: mean_abs=0.7524082064628601, max_abs=5.0, mean_rel=0.16125108301639557, max_rel=855.0538330078125, norm_rel=0.02438836731016636, ref_abs_avg=30.987443923950195, test_abs_avg=30.98763084411621
production_forward2 grad[43] vs paper_forward: mean_abs=0.6987869739532471, max_abs=4.7578125, mean_rel=0.26437628269195557, max_rel=2265.625, norm_rel=0.022791940718889236, ref_abs_avg=30.734745025634766, test_abs_avg=30.730606079101562
production_forward2 grad[44] vs paper_forward: mean_abs=0.5841636657714844, max_abs=2.25, mean_rel=0.06420556455850601, max_rel=5.271716594696045, norm_rel=0.023389440029859543, ref_abs_avg=25.52114486694336, test_abs_avg=25.574058532714844
production_forward2 grad[45] vs paper_forward: mean_abs=0.7141295075416565, max_abs=4.5, mean_rel=0.15727588534355164, max_rel=1107.0733642578125, norm_rel=0.024081801995635033, ref_abs_avg=29.73104476928711, test_abs_avg=29.73107147216797
production_forward2 grad[46] vs paper_forward: mean_abs=0.6625820398330688, max_abs=4.0, mean_rel=0.2735578119754791, max_rel=2031.2498779296875, norm_rel=0.02261018566787243, ref_abs_avg=29.360727310180664, test_abs_avg=29.362857818603516
production_forward2 grad[47] vs paper_forward: mean_abs=0.5465415716171265, max_abs=2.125, mean_rel=0.3792192041873932, max_rel=66.53377532958984, norm_rel=0.023179199546575546, ref_abs_avg=23.63433074951172, test_abs_avg=23.599170684814453
production_forward2 grad[48] vs paper_forward: mean_abs=0.6810943484306335, max_abs=5.5, mean_rel=0.15426433086395264, max_rel=1124.5782470703125, norm_rel=0.023954002186655998, ref_abs_avg=28.512149810791016, test_abs_avg=28.513931274414062
production_forward2 grad[49] vs paper_forward: mean_abs=0.6319794654846191, max_abs=4.0, mean_rel=0.2610481381416321, max_rel=1914.0623779296875, norm_rel=0.022388508543372154, ref_abs_avg=28.31332778930664, test_abs_avg=28.311668395996094
production_forward2 grad[50] vs paper_forward: mean_abs=0.5885086059570312, max_abs=2.078125, mean_rel=0.28956684470176697, max_rel=86.42251586914062, norm_rel=0.024524273350834846, ref_abs_avg=23.950767517089844, test_abs_avg=24.027362823486328
production_forward2 grad[51] vs paper_forward: mean_abs=0.761661171913147, max_abs=6.5, mean_rel=0.15395863354206085, max_rel=650.2862548828125, norm_rel=0.0253387950360775, ref_abs_avg=30.183094024658203, test_abs_avg=30.181781768798828
production_forward2 grad[52] vs paper_forward: mean_abs=0.7069550156593323, max_abs=5.0, mean_rel=0.27580487728118896, max_rel=2562.5, norm_rel=0.02399637922644615, ref_abs_avg=29.565296173095703, test_abs_avg=29.56513214111328
production_forward2 grad[53] vs paper_forward: mean_abs=0.5627869367599487, max_abs=2.5, mean_rel=0.14640314877033234, max_rel=11.352185249328613, norm_rel=0.02455805614590645, ref_abs_avg=22.59943962097168, test_abs_avg=22.568058013916016
production_forward2 grad[54] vs paper_forward: mean_abs=0.6934982538223267, max_abs=4.5, mean_rel=0.16007034480571747, max_rel=1124.2933349609375, norm_rel=0.025158178061246872, ref_abs_avg=27.63666534423828, test_abs_avg=27.639177322387695
production_forward2 grad[55] vs paper_forward: mean_abs=0.6464455127716064, max_abs=4.0, mean_rel=0.2602534294128418, max_rel=1781.2498779296875, norm_rel=0.02330983243882656, ref_abs_avg=27.71337890625, test_abs_avg=27.707290649414062
production_forward2 grad[56] vs paper_forward: mean_abs=0.5104281306266785, max_abs=1.875, mean_rel=0.2716549038887024, max_rel=74.2433090209961, norm_rel=0.022677533328533173, ref_abs_avg=22.06719207763672, test_abs_avg=22.064247131347656
production_forward2 grad[57] vs paper_forward: mean_abs=0.6525710225105286, max_abs=5.5, mean_rel=0.14850011467933655, max_rel=1005.9197387695312, norm_rel=0.024581292644143105, ref_abs_avg=26.63531494140625, test_abs_avg=26.635120391845703
production_forward2 grad[58] vs paper_forward: mean_abs=0.604469358921051, max_abs=3.875, mean_rel=0.25753527879714966, max_rel=1687.4998779296875, norm_rel=0.022800981998443604, ref_abs_avg=26.509746551513672, test_abs_avg=26.511091232299805
production_forward2 grad[59] vs paper_forward: mean_abs=0.45397472381591797, max_abs=1.96875, mean_rel=0.09379662573337555, max_rel=7.962927341461182, norm_rel=0.021829988807439804, ref_abs_avg=20.95742416381836, test_abs_avg=20.955810546875
production_forward2 grad[60] vs paper_forward: mean_abs=0.6135246157646179, max_abs=5.0, mean_rel=0.15376362204551697, max_rel=905.484375, norm_rel=0.024046765640378, ref_abs_avg=25.574947357177734, test_abs_avg=25.575132369995117
production_forward2 grad[61] vs paper_forward: mean_abs=0.571082353591919, max_abs=3.875, mean_rel=0.2394377887248993, max_rel=2031.2498779296875, norm_rel=0.022706272080540657, ref_abs_avg=25.189273834228516, test_abs_avg=25.19271469116211
production_forward2 grad[62] vs paper_forward: mean_abs=0.4813356399536133, max_abs=1.810546875, mean_rel=0.08132763206958771, max_rel=1.966794729232788, norm_rel=0.023881062865257263, ref_abs_avg=20.03789710998535, test_abs_avg=19.990047454833984
production_forward2 grad[63] vs paper_forward: mean_abs=0.5828136205673218, max_abs=4.5, mean_rel=0.15310761332511902, max_rel=881.3025512695312, norm_rel=0.023991694673895836, ref_abs_avg=24.356826782226562, test_abs_avg=24.3568058013916
production_forward2 grad[64] vs paper_forward: mean_abs=0.541096568107605, max_abs=3.640625, mean_rel=0.21987155079841614, max_rel=1406.2498779296875, norm_rel=0.022236965596675873, ref_abs_avg=24.29850959777832, test_abs_avg=24.285659790039062
production_forward2 grad[65] vs paper_forward: mean_abs=0.4189877510070801, max_abs=1.796875, mean_rel=0.1311453878879547, max_rel=15.116660118103027, norm_rel=0.02070089429616928, ref_abs_avg=20.149154663085938, test_abs_avg=20.142873764038086
production_forward2 grad[66] vs paper_forward: mean_abs=0.5488709211349487, max_abs=4.0, mean_rel=0.14412543177604675, max_rel=888.4699096679688, norm_rel=0.02334894798696041, ref_abs_avg=23.551830291748047, test_abs_avg=23.553260803222656
production_forward2 grad[67] vs paper_forward: mean_abs=0.5078493356704712, max_abs=3.625, mean_rel=0.20572373270988464, max_rel=1718.7498779296875, norm_rel=0.02204020693898201, ref_abs_avg=23.062036514282227, test_abs_avg=23.055347442626953
production_forward2 grad[68] vs paper_forward: mean_abs=0.3983802795410156, max_abs=1.5, mean_rel=0.0849686786532402, max_rel=5.258861064910889, norm_rel=0.02145632542669773, ref_abs_avg=19.03464126586914, test_abs_avg=19.00490951538086
production_forward2 grad[69] vs paper_forward: mean_abs=0.5250870585441589, max_abs=5.0, mean_rel=0.14491060376167297, max_rel=885.2892456054688, norm_rel=0.023185377940535545, ref_abs_avg=22.655019760131836, test_abs_avg=22.6541748046875
production_forward2 grad[70] vs paper_forward: mean_abs=0.4826560616493225, max_abs=3.4375, mean_rel=0.2104124128818512, max_rel=1749.9998779296875, norm_rel=0.0212597344070673, ref_abs_avg=22.674694061279297, test_abs_avg=22.67238998413086
production_forward2 grad[71] vs paper_forward: mean_abs=0.37903594970703125, max_abs=1.625, mean_rel=0.06622505187988281, max_rel=2.6126809120178223, norm_rel=0.019725989550352097, ref_abs_avg=19.406997680664062, test_abs_avg=19.374704360961914
production_forward2 grad[72] vs paper_forward: mean_abs=0.5010420083999634, max_abs=4.0, mean_rel=0.14193086326122284, max_rel=955.0009765625, norm_rel=0.022592177614569664, ref_abs_avg=22.227554321289062, test_abs_avg=22.225570678710938
production_forward2 grad[73] vs paper_forward: mean_abs=0.4574907720088959, max_abs=3.5, mean_rel=0.22336798906326294, max_rel=2953.124755859375, norm_rel=0.021044427528977394, ref_abs_avg=21.710683822631836, test_abs_avg=21.708148956298828
production_forward2 grad[74] vs paper_forward: mean_abs=0.42830610275268555, max_abs=1.5, mean_rel=0.08709719032049179, max_rel=9.282129287719727, norm_rel=0.02099594846367836, ref_abs_avg=20.071014404296875, test_abs_avg=20.09124755859375
production_forward2 grad[75] vs paper_forward: mean_abs=0.55229252576828, max_abs=5.0, mean_rel=0.14493289589881897, max_rel=932.7471313476562, norm_rel=0.023905323818325996, ref_abs_avg=23.182941436767578, test_abs_avg=23.18185806274414
production_forward2 grad[76] vs paper_forward: mean_abs=0.514480471611023, max_abs=3.6875, mean_rel=0.2153533548116684, max_rel=1406.2498779296875, norm_rel=0.022298775613307953, ref_abs_avg=23.14560317993164, test_abs_avg=23.14116096496582
production_forward2 grad[77] vs paper_forward: mean_abs=0.43393588066101074, max_abs=1.5, mean_rel=0.26674944162368774, max_rel=84.2615966796875, norm_rel=0.02308546006679535, ref_abs_avg=18.440889358520508, test_abs_avg=18.410045623779297
production_forward2 grad[78] vs paper_forward: mean_abs=0.5161359310150146, max_abs=4.5, mean_rel=0.14803683757781982, max_rel=919.049072265625, norm_rel=0.02343081310391426, ref_abs_avg=22.082088470458984, test_abs_avg=22.082706451416016
production_forward2 grad[79] vs paper_forward: mean_abs=0.47299909591674805, max_abs=3.5, mean_rel=0.19502457976341248, max_rel=1406.2498779296875, norm_rel=0.02205968275666237, ref_abs_avg=21.508865356445312, test_abs_avg=21.509166717529297
production_forward2 grad[80] vs paper_forward: mean_abs=0.36762094497680664, max_abs=1.5, mean_rel=0.08740536868572235, max_rel=9.375608444213867, norm_rel=0.022170038893818855, ref_abs_avg=16.764415740966797, test_abs_avg=16.780704498291016
production_forward2 grad[81] vs paper_forward: mean_abs=0.4741615056991577, max_abs=5.0, mean_rel=0.14345607161521912, max_rel=738.9680786132812, norm_rel=0.022899530827999115, ref_abs_avg=20.74722671508789, test_abs_avg=20.747314453125
production_forward2 grad[82] vs paper_forward: mean_abs=0.4347360134124756, max_abs=3.625, mean_rel=0.2100261151790619, max_rel=1499.9998779296875, norm_rel=0.02110302820801735, ref_abs_avg=20.62359046936035, test_abs_avg=20.632047653198242
production_forward2 grad[83] vs paper_forward: mean_abs=0.34834039211273193, max_abs=1.5, mean_rel=0.16009356081485748, max_rel=21.59204864501953, norm_rel=0.02047465555369854, ref_abs_avg=16.836017608642578, test_abs_avg=16.829429626464844
production_forward2 grad[84] vs paper_forward: mean_abs=0.4453967213630676, max_abs=5.0, mean_rel=0.13514816761016846, max_rel=1094.46630859375, norm_rel=0.022433750331401825, ref_abs_avg=19.935256958007812, test_abs_avg=19.93401336669922
production_forward2 grad[85] vs paper_forward: mean_abs=0.4050225615501404, max_abs=3.46875, mean_rel=0.20825523138046265, max_rel=1125.0, norm_rel=0.020743248984217644, ref_abs_avg=19.62401580810547, test_abs_avg=19.614974975585938
production_forward2 grad[86] vs paper_forward: mean_abs=0.33206993341445923, max_abs=1.25, mean_rel=0.2794174551963806, max_rel=109.55994415283203, norm_rel=0.0212470144033432, ref_abs_avg=15.685928344726562, test_abs_avg=15.70103645324707
production_forward2 grad[87] vs paper_forward: mean_abs=0.41600143909454346, max_abs=4.0, mean_rel=0.13582076132297516, max_rel=834.6943969726562, norm_rel=0.021648865193128586, ref_abs_avg=19.35485076904297, test_abs_avg=19.354522705078125
production_forward2 grad[88] vs paper_forward: mean_abs=0.38097044825553894, max_abs=3.5625, mean_rel=0.16445371508598328, max_rel=1109.375, norm_rel=0.01938813365995884, ref_abs_avg=19.675987243652344, test_abs_avg=19.674938201904297
production_forward2 grad[89] vs paper_forward: mean_abs=0.3102610111236572, max_abs=1.625, mean_rel=0.23847411572933197, max_rel=64.7461929321289, norm_rel=0.019590437412261963, ref_abs_avg=15.819189071655273, test_abs_avg=15.846271514892578
production_forward2 grad[90] vs paper_forward: mean_abs=0.40484344959259033, max_abs=4.5, mean_rel=0.1310211569070816, max_rel=609.8453979492188, norm_rel=0.02145964838564396, ref_abs_avg=19.026586532592773, test_abs_avg=19.025909423828125
production_forward2 grad[91] vs paper_forward: mean_abs=0.3615040183067322, max_abs=4.25, mean_rel=0.15096093714237213, max_rel=874.9999389648438, norm_rel=0.019761638715863228, ref_abs_avg=18.506004333496094, test_abs_avg=18.51047134399414
production_forward2 grad[92] vs paper_forward: mean_abs=0.29337453842163086, max_abs=1.0625, mean_rel=0.05881282314658165, max_rel=3.3062562942504883, norm_rel=0.01832285150885582, ref_abs_avg=16.213098526000977, test_abs_avg=16.18859100341797
production_forward2 grad[93] vs paper_forward: mean_abs=0.37650996446609497, max_abs=4.0, mean_rel=0.1250913143157959, max_rel=549.5801391601562, norm_rel=0.02085096761584282, ref_abs_avg=18.272499084472656, test_abs_avg=18.272598266601562
production_forward2 grad[94] vs paper_forward: mean_abs=0.33221861720085144, max_abs=3.0, mean_rel=0.1547526866197586, max_rel=1328.1248779296875, norm_rel=0.018916215747594833, ref_abs_avg=17.811420440673828, test_abs_avg=17.821529388427734
production_forward2 grad[95] vs paper_forward: mean_abs=0.2953453063964844, max_abs=1.125, mean_rel=0.05940317362546921, max_rel=2.12803316116333, norm_rel=0.019109800457954407, ref_abs_avg=15.829227447509766, test_abs_avg=15.832965850830078
production_forward2 grad[96] vs paper_forward: mean_abs=0.36464250087738037, max_abs=5.5, mean_rel=0.139136403799057, max_rel=1013.2445068359375, norm_rel=0.02069116197526455, ref_abs_avg=17.929426193237305, test_abs_avg=17.93023109436035
production_forward2 grad[97] vs paper_forward: mean_abs=0.3205823302268982, max_abs=3.5, mean_rel=0.18576085567474365, max_rel=1312.4998779296875, norm_rel=0.01929895021021366, ref_abs_avg=16.957149505615234, test_abs_avg=16.95671844482422
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  166.113 ms
torch_compile_phases_forward bwd-only: 132.816 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB
paper_forward fwd+bwd:  382.505 ms
paper_forward bwd-only: 302.396 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.744 GiB, fwd+bwd=32.494 GiB
production_forward2 fwd+bwd:  114.521 ms
production_forward2 bwd-only: 95.979 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.326 GiB, fwd+bwd=10.326 GiB
production_forward fwd+bwd:  114.784 ms
production_forward bwd-only: 95.849 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.326 GiB, fwd+bwd=10.326 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016313892556354403, max_abs=0.03515625
production_forward grad[0] vs paper_forward: mean_abs=0.008360357023775578, max_abs=0.576171875, mean_rel=0.07242920249700546, max_rel=90.20568084716797, norm_rel=0.01983179710805416, ref_abs_avg=0.4571346640586853, test_abs_avg=0.45715320110321045
production_forward grad[1] vs paper_forward: mean_abs=7.230140209197998, max_abs=56.0, mean_rel=0.12389621138572693, max_rel=91.24006652832031, norm_rel=0.020704247057437897, ref_abs_avg=316.1289367675781, test_abs_avg=316.17584228515625
production_forward grad[2] vs paper_forward: mean_abs=1.271251916885376, max_abs=4.5, mean_rel=0.19192150235176086, max_rel=55.36253356933594, norm_rel=0.02345539815723896, ref_abs_avg=55.29058837890625, test_abs_avg=55.26341247558594
production_forward grad[3] vs paper_forward: mean_abs=1.6021825075149536, max_abs=10.5, mean_rel=0.1746152639389038, max_rel=2706.880859375, norm_rel=0.024435197934508324, ref_abs_avg=65.97898864746094, test_abs_avg=65.97982788085938
production_forward grad[4] vs paper_forward: mean_abs=1.4882960319519043, max_abs=8.125, mean_rel=0.39836597442626953, max_rel=4437.5, norm_rel=0.022948555648326874, ref_abs_avg=65.19099426269531, test_abs_avg=65.19401550292969
production_forward grad[5] vs paper_forward: mean_abs=1.113083839416504, max_abs=5.0, mean_rel=0.07292140275239944, max_rel=4.964556694030762, norm_rel=0.022231370210647583, ref_abs_avg=49.36550521850586, test_abs_avg=49.39129638671875
production_forward grad[6] vs paper_forward: mean_abs=1.414414405822754, max_abs=9.5, mean_rel=0.16736219823360443, max_rel=1091.5162353515625, norm_rel=0.02423880062997341, ref_abs_avg=58.7695198059082, test_abs_avg=58.771541595458984
production_forward grad[7] vs paper_forward: mean_abs=1.3085455894470215, max_abs=8.25, mean_rel=0.3503936529159546, max_rel=4500.0, norm_rel=0.022532889619469643, ref_abs_avg=58.43223571777344, test_abs_avg=58.43103790283203
production_forward grad[8] vs paper_forward: mean_abs=1.0581213235855103, max_abs=4.5, mean_rel=0.12304408848285675, max_rel=23.947429656982422, norm_rel=0.02359938621520996, ref_abs_avg=44.95466613769531, test_abs_avg=44.87084197998047
production_forward grad[9] vs paper_forward: mean_abs=1.2962498664855957, max_abs=8.625, mean_rel=0.16615428030490875, max_rel=2363.30029296875, norm_rel=0.024090245366096497, ref_abs_avg=54.13782501220703, test_abs_avg=54.139892578125
production_forward grad[10] vs paper_forward: mean_abs=1.194239616394043, max_abs=8.0, mean_rel=0.32466110587120056, max_rel=3249.999755859375, norm_rel=0.022386258468031883, ref_abs_avg=53.61656951904297, test_abs_avg=53.62010192871094
production_forward grad[11] vs paper_forward: mean_abs=0.9371986389160156, max_abs=4.0, mean_rel=0.08519857376813889, max_rel=3.461043357849121, norm_rel=0.022589463740587234, ref_abs_avg=41.002601623535156, test_abs_avg=40.97568130493164
production_forward grad[12] vs paper_forward: mean_abs=1.1878052949905396, max_abs=8.0, mean_rel=0.15416193008422852, max_rel=1843.07666015625, norm_rel=0.023940851911902428, ref_abs_avg=49.96805953979492, test_abs_avg=49.971920013427734
production_forward grad[13] vs paper_forward: mean_abs=1.099637746810913, max_abs=7.0, mean_rel=0.3604939579963684, max_rel=3437.499755859375, norm_rel=0.022371385246515274, ref_abs_avg=49.43346405029297, test_abs_avg=49.433509826660156
production_forward grad[14] vs paper_forward: mean_abs=0.8219413757324219, max_abs=3.0, mean_rel=0.06527645885944366, max_rel=3.405165195465088, norm_rel=0.020168354734778404, ref_abs_avg=41.897430419921875, test_abs_avg=41.918094635009766
production_forward grad[15] vs paper_forward: mean_abs=1.1073874235153198, max_abs=9.0, mean_rel=0.16940081119537354, max_rel=1484.398193359375, norm_rel=0.023688437417149544, ref_abs_avg=47.058204650878906, test_abs_avg=47.062538146972656
production_forward grad[16] vs paper_forward: mean_abs=1.0171289443969727, max_abs=6.5, mean_rel=0.3558659851551056, max_rel=4031.249755859375, norm_rel=0.021962113678455353, ref_abs_avg=46.60114288330078, test_abs_avg=46.609622955322266
production_forward grad[17] vs paper_forward: mean_abs=0.8068246841430664, max_abs=3.125, mean_rel=0.08218354731798172, max_rel=6.825982093811035, norm_rel=0.022847164422273636, ref_abs_avg=35.42210006713867, test_abs_avg=35.54997634887695
production_forward grad[18] vs paper_forward: mean_abs=1.044722557067871, max_abs=10.0, mean_rel=0.16429197788238525, max_rel=1943.1431884765625, norm_rel=0.023598289117217064, ref_abs_avg=44.58131408691406, test_abs_avg=44.584007263183594
production_forward grad[19] vs paper_forward: mean_abs=0.9560554623603821, max_abs=6.0, mean_rel=0.3328208029270172, max_rel=3187.499755859375, norm_rel=0.021891599521040916, ref_abs_avg=43.953468322753906, test_abs_avg=43.957366943359375
production_forward grad[20] vs paper_forward: mean_abs=0.7855392694473267, max_abs=3.75, mean_rel=0.13930097222328186, max_rel=10.289754867553711, norm_rel=0.02296854369342327, ref_abs_avg=33.45268630981445, test_abs_avg=33.477516174316406
production_forward grad[21] vs paper_forward: mean_abs=0.9917170405387878, max_abs=8.0, mean_rel=0.15550044178962708, max_rel=1928.2315673828125, norm_rel=0.023535335436463356, ref_abs_avg=42.37987518310547, test_abs_avg=42.378623962402344
production_forward grad[22] vs paper_forward: mean_abs=0.906889796257019, max_abs=5.5, mean_rel=0.2681710720062256, max_rel=2343.75, norm_rel=0.021659046411514282, ref_abs_avg=42.093833923339844, test_abs_avg=42.095428466796875
production_forward grad[23] vs paper_forward: mean_abs=0.7212333679199219, max_abs=3.375, mean_rel=0.06501206755638123, max_rel=6.47716760635376, norm_rel=0.021727025508880615, ref_abs_avg=34.67110824584961, test_abs_avg=34.661720275878906
production_forward grad[24] vs paper_forward: mean_abs=0.9446815848350525, max_abs=7.0, mean_rel=0.1549738496541977, max_rel=1752.0164794921875, norm_rel=0.023368095979094505, ref_abs_avg=40.68248748779297, test_abs_avg=40.68642044067383
production_forward grad[25] vs paper_forward: mean_abs=0.8639928102493286, max_abs=5.125, mean_rel=0.2965167164802551, max_rel=3343.749755859375, norm_rel=0.021672679111361504, ref_abs_avg=40.059486389160156, test_abs_avg=40.06260681152344
production_forward grad[26] vs paper_forward: mean_abs=0.8376836776733398, max_abs=3.875, mean_rel=0.1342201828956604, max_rel=24.726665496826172, norm_rel=0.02393110655248165, ref_abs_avg=35.759674072265625, test_abs_avg=35.719032287597656
production_forward grad[27] vs paper_forward: mean_abs=1.0806834697723389, max_abs=7.0, mean_rel=0.17650535702705383, max_rel=1617.5684814453125, norm_rel=0.025285694748163223, ref_abs_avg=42.96768569946289, test_abs_avg=42.972869873046875
production_forward grad[28] vs paper_forward: mean_abs=1.0053942203521729, max_abs=6.5, mean_rel=0.36379680037498474, max_rel=2937.499755859375, norm_rel=0.023597730323672295, ref_abs_avg=42.75102233886719, test_abs_avg=42.76754379272461
production_forward grad[29] vs paper_forward: mean_abs=0.7820663452148438, max_abs=3.0, mean_rel=0.124895378947258, max_rel=10.393881797790527, norm_rel=0.02434229850769043, ref_abs_avg=31.591154098510742, test_abs_avg=31.673290252685547
production_forward grad[30] vs paper_forward: mean_abs=0.993965208530426, max_abs=7.25, mean_rel=0.16599109768867493, max_rel=1610.7845458984375, norm_rel=0.025456368923187256, ref_abs_avg=39.21950912475586, test_abs_avg=39.22383499145508
production_forward grad[31] vs paper_forward: mean_abs=0.9272017478942871, max_abs=6.0, mean_rel=0.28127408027648926, max_rel=3187.499755859375, norm_rel=0.02409372851252556, ref_abs_avg=38.627723693847656, test_abs_avg=38.621307373046875
production_forward grad[32] vs paper_forward: mean_abs=0.7309741973876953, max_abs=3.5, mean_rel=0.07352983951568604, max_rel=3.8389322757720947, norm_rel=0.02405511774122715, ref_abs_avg=30.683849334716797, test_abs_avg=30.616159439086914
production_forward grad[33] vs paper_forward: mean_abs=0.9268850088119507, max_abs=7.0, mean_rel=0.15828251838684082, max_rel=904.5311279296875, norm_rel=0.025316692888736725, ref_abs_avg=36.809783935546875, test_abs_avg=36.811805725097656
production_forward grad[34] vs paper_forward: mean_abs=0.8647952079772949, max_abs=6.0, mean_rel=0.28222236037254333, max_rel=2624.999755859375, norm_rel=0.0238359235227108, ref_abs_avg=36.37580108642578, test_abs_avg=36.382869720458984
production_forward grad[35] vs paper_forward: mean_abs=0.6727193593978882, max_abs=2.875, mean_rel=0.1874871402978897, max_rel=22.69371795654297, norm_rel=0.023911457508802414, ref_abs_avg=28.662532806396484, test_abs_avg=28.64640998840332
production_forward grad[36] vs paper_forward: mean_abs=0.8795458078384399, max_abs=6.0, mean_rel=0.16198988258838654, max_rel=1255.30126953125, norm_rel=0.025266895070672035, ref_abs_avg=34.98015594482422, test_abs_avg=34.98418426513672
production_forward grad[37] vs paper_forward: mean_abs=0.8060752153396606, max_abs=5.5, mean_rel=0.2915588617324829, max_rel=2624.999755859375, norm_rel=0.02344924584031105, ref_abs_avg=34.52879333496094, test_abs_avg=34.52411651611328
production_forward grad[38] vs paper_forward: mean_abs=0.6068005561828613, max_abs=2.75, mean_rel=0.18486414849758148, max_rel=35.956268310546875, norm_rel=0.022715186700224876, ref_abs_avg=26.92026138305664, test_abs_avg=26.951061248779297
production_forward grad[39] vs paper_forward: mean_abs=0.825076699256897, max_abs=6.5, mean_rel=0.15465039014816284, max_rel=1643.2977294921875, norm_rel=0.024767527356743813, ref_abs_avg=33.438331604003906, test_abs_avg=33.439414978027344
production_forward grad[40] vs paper_forward: mean_abs=0.7645418047904968, max_abs=5.71875, mean_rel=0.2379835546016693, max_rel=1843.7498779296875, norm_rel=0.023423120379447937, ref_abs_avg=32.76579284667969, test_abs_avg=32.77369689941406
production_forward grad[41] vs paper_forward: mean_abs=0.6067581176757812, max_abs=2.125, mean_rel=0.0675916075706482, max_rel=3.453254222869873, norm_rel=0.021817686036229134, ref_abs_avg=27.9495849609375, test_abs_avg=27.98029327392578
production_forward grad[42] vs paper_forward: mean_abs=0.7780647277832031, max_abs=6.0, mean_rel=0.16779474914073944, max_rel=1239.3958740234375, norm_rel=0.02479618601500988, ref_abs_avg=31.503814697265625, test_abs_avg=31.50368881225586
production_forward grad[43] vs paper_forward: mean_abs=0.7206192016601562, max_abs=4.5, mean_rel=0.251176655292511, max_rel=2375.0, norm_rel=0.023181216791272163, ref_abs_avg=31.1804256439209, test_abs_avg=31.189128875732422
production_forward grad[44] vs paper_forward: mean_abs=0.6022922992706299, max_abs=2.40625, mean_rel=0.11188706010580063, max_rel=12.884599685668945, norm_rel=0.02334008365869522, ref_abs_avg=25.768295288085938, test_abs_avg=25.755529403686523
production_forward grad[45] vs paper_forward: mean_abs=0.7402132749557495, max_abs=6.0, mean_rel=0.15097060799598694, max_rel=589.0875244140625, norm_rel=0.024383801966905594, ref_abs_avg=30.45500946044922, test_abs_avg=30.458669662475586
production_forward grad[46] vs paper_forward: mean_abs=0.6896489858627319, max_abs=4.5, mean_rel=0.2426171451807022, max_rel=1843.7498779296875, norm_rel=0.02322925068438053, ref_abs_avg=29.782922744750977, test_abs_avg=29.786386489868164
production_forward grad[47] vs paper_forward: mean_abs=0.5921297073364258, max_abs=2.125, mean_rel=0.16252441704273224, max_rel=16.540922164916992, norm_rel=0.024778246879577637, ref_abs_avg=23.543964385986328, test_abs_avg=23.51601791381836
production_forward grad[48] vs paper_forward: mean_abs=0.7126996517181396, max_abs=5.0, mean_rel=0.16143333911895752, max_rel=1399.2972412109375, norm_rel=0.024117032065987587, ref_abs_avg=29.640274047851562, test_abs_avg=29.643701553344727
production_forward grad[49] vs paper_forward: mean_abs=0.6576024293899536, max_abs=4.40625, mean_rel=0.2914494276046753, max_rel=2999.999755859375, norm_rel=0.022606829181313515, ref_abs_avg=29.134721755981445, test_abs_avg=29.135408401489258
production_forward grad[50] vs paper_forward: mean_abs=0.6236977577209473, max_abs=2.28125, mean_rel=0.14426285028457642, max_rel=22.958362579345703, norm_rel=0.02485617808997631, ref_abs_avg=25.504371643066406, test_abs_avg=25.592693328857422
production_forward grad[51] vs paper_forward: mean_abs=0.7824324369430542, max_abs=5.5, mean_rel=0.16814231872558594, max_rel=1526.4857177734375, norm_rel=0.025518128648400307, ref_abs_avg=30.717647552490234, test_abs_avg=30.719627380371094
production_forward grad[52] vs paper_forward: mean_abs=0.7363066673278809, max_abs=5.0, mean_rel=0.2832707464694977, max_rel=2406.25, norm_rel=0.0243818536400795, ref_abs_avg=30.348173141479492, test_abs_avg=30.35786247253418
production_forward grad[53] vs paper_forward: mean_abs=0.5503036975860596, max_abs=2.40625, mean_rel=0.13545483350753784, max_rel=11.206161499023438, norm_rel=0.023859873414039612, ref_abs_avg=23.157258987426758, test_abs_avg=23.16675567626953
production_forward grad[54] vs paper_forward: mean_abs=0.7188122272491455, max_abs=5.0, mean_rel=0.15282192826271057, max_rel=858.67333984375, norm_rel=0.025247154757380486, ref_abs_avg=28.569644927978516, test_abs_avg=28.570791244506836
production_forward grad[55] vs paper_forward: mean_abs=0.6692876219749451, max_abs=4.703125, mean_rel=0.26527345180511475, max_rel=2343.75, norm_rel=0.023871060460805893, ref_abs_avg=28.11749267578125, test_abs_avg=28.1195011138916
production_forward grad[56] vs paper_forward: mean_abs=0.5431421399116516, max_abs=2.5, mean_rel=0.12052010744810104, max_rel=19.008771896362305, norm_rel=0.024558329954743385, ref_abs_avg=22.331819534301758, test_abs_avg=22.358386993408203
production_forward grad[57] vs paper_forward: mean_abs=0.6672368049621582, max_abs=5.0, mean_rel=0.15374541282653809, max_rel=1143.6292724609375, norm_rel=0.024722924456000328, ref_abs_avg=27.045833587646484, test_abs_avg=27.049001693725586
production_forward grad[58] vs paper_forward: mean_abs=0.6199169158935547, max_abs=4.0, mean_rel=0.2474198192358017, max_rel=1781.2498779296875, norm_rel=0.023248059675097466, ref_abs_avg=26.65519905090332, test_abs_avg=26.6595516204834
production_forward grad[59] vs paper_forward: mean_abs=0.45937395095825195, max_abs=1.875, mean_rel=0.09404498338699341, max_rel=10.772417068481445, norm_rel=0.020653335377573967, ref_abs_avg=22.538625717163086, test_abs_avg=22.527690887451172
production_forward grad[60] vs paper_forward: mean_abs=0.6254257559776306, max_abs=4.5, mean_rel=0.16350682079792023, max_rel=1382.436279296875, norm_rel=0.02430075779557228, ref_abs_avg=25.78514862060547, test_abs_avg=25.786434173583984
production_forward grad[61] vs paper_forward: mean_abs=0.5771828889846802, max_abs=4.25, mean_rel=0.2229519784450531, max_rel=1468.7498779296875, norm_rel=0.022848907858133316, ref_abs_avg=25.269500732421875, test_abs_avg=25.272714614868164
production_forward grad[62] vs paper_forward: mean_abs=0.4709959030151367, max_abs=1.921875, mean_rel=0.08542311191558838, max_rel=3.2811179161071777, norm_rel=0.024108219891786575, ref_abs_avg=19.84964942932129, test_abs_avg=19.896900177001953
production_forward grad[63] vs paper_forward: mean_abs=0.5893779993057251, max_abs=4.0, mean_rel=0.14785289764404297, max_rel=886.6626586914062, norm_rel=0.023928403854370117, ref_abs_avg=24.661380767822266, test_abs_avg=24.66254997253418
production_forward grad[64] vs paper_forward: mean_abs=0.5445737838745117, max_abs=4.0, mean_rel=0.2083345353603363, max_rel=1960.9373779296875, norm_rel=0.0223789494484663, ref_abs_avg=24.352123260498047, test_abs_avg=24.349353790283203
production_forward grad[65] vs paper_forward: mean_abs=0.4510459899902344, max_abs=1.9375, mean_rel=0.0746384859085083, max_rel=2.038895845413208, norm_rel=0.024041462689638138, ref_abs_avg=18.710163116455078, test_abs_avg=18.682571411132812
production_forward grad[66] vs paper_forward: mean_abs=0.5652018785476685, max_abs=4.5, mean_rel=0.1479724496603012, max_rel=803.509765625, norm_rel=0.023505793884396553, ref_abs_avg=24.070829391479492, test_abs_avg=24.07195281982422
production_forward grad[67] vs paper_forward: mean_abs=0.5201119780540466, max_abs=4.0, mean_rel=0.20278400182724, max_rel=1468.7498779296875, norm_rel=0.022284243255853653, ref_abs_avg=23.38620376586914, test_abs_avg=23.39105796813965
production_forward grad[68] vs paper_forward: mean_abs=0.39992475509643555, max_abs=1.75, mean_rel=0.13022232055664062, max_rel=18.182132720947266, norm_rel=0.02046906016767025, ref_abs_avg=19.657333374023438, test_abs_avg=19.66166114807129
production_forward grad[69] vs paper_forward: mean_abs=0.5296616554260254, max_abs=5.0, mean_rel=0.14650721848011017, max_rel=1294.067138671875, norm_rel=0.0233091339468956, ref_abs_avg=22.7923583984375, test_abs_avg=22.793577194213867
production_forward grad[70] vs paper_forward: mean_abs=0.49223196506500244, max_abs=3.5, mean_rel=0.2454877495765686, max_rel=1453.1248779296875, norm_rel=0.02175818383693695, ref_abs_avg=22.691104888916016, test_abs_avg=22.69333267211914
production_forward grad[71] vs paper_forward: mean_abs=0.40819549560546875, max_abs=2.0, mean_rel=0.06920896470546722, max_rel=2.7646796703338623, norm_rel=0.022432349622249603, ref_abs_avg=18.563365936279297, test_abs_avg=18.537857055664062
production_forward grad[72] vs paper_forward: mean_abs=0.5159823298454285, max_abs=4.0, mean_rel=0.14392060041427612, max_rel=895.0335083007812, norm_rel=0.022997237741947174, ref_abs_avg=22.468917846679688, test_abs_avg=22.469959259033203
production_forward grad[73] vs paper_forward: mean_abs=0.479347825050354, max_abs=4.125, mean_rel=0.21302610635757446, max_rel=1765.6248779296875, norm_rel=0.021539034321904182, ref_abs_avg=22.269960403442383, test_abs_avg=22.2828311920166
production_forward grad[74] vs paper_forward: mean_abs=0.4391845464706421, max_abs=2.125, mean_rel=0.1849086582660675, max_rel=43.76068115234375, norm_rel=0.02485240250825882, ref_abs_avg=18.00374984741211, test_abs_avg=17.964996337890625
production_forward grad[75] vs paper_forward: mean_abs=0.568304717540741, max_abs=4.5, mean_rel=0.16264963150024414, max_rel=902.0363159179688, norm_rel=0.02421688660979271, ref_abs_avg=23.518569946289062, test_abs_avg=23.518871307373047
production_forward grad[76] vs paper_forward: mean_abs=0.5266096591949463, max_abs=3.5, mean_rel=0.2154485285282135, max_rel=1687.4998779296875, norm_rel=0.023115966469049454, ref_abs_avg=22.891944885253906, test_abs_avg=22.895641326904297
production_forward grad[77] vs paper_forward: mean_abs=0.3953573703765869, max_abs=1.546875, mean_rel=0.16003447771072388, max_rel=23.987743377685547, norm_rel=0.022811459377408028, ref_abs_avg=17.222578048706055, test_abs_avg=17.231029510498047
production_forward grad[78] vs paper_forward: mean_abs=0.5165510177612305, max_abs=6.0, mean_rel=0.15445753931999207, max_rel=784.023681640625, norm_rel=0.024031352251768112, ref_abs_avg=21.525249481201172, test_abs_avg=21.525390625
production_forward grad[79] vs paper_forward: mean_abs=0.48478609323501587, max_abs=4.5, mean_rel=0.1964009553194046, max_rel=1343.7498779296875, norm_rel=0.022552894428372383, ref_abs_avg=21.586936950683594, test_abs_avg=21.58814239501953
production_forward grad[80] vs paper_forward: mean_abs=0.3940887451171875, max_abs=1.625, mean_rel=0.08560138195753098, max_rel=5.837981700897217, norm_rel=0.022299449890851974, ref_abs_avg=17.617557525634766, test_abs_avg=17.58450698852539
production_forward grad[81] vs paper_forward: mean_abs=0.4928298592567444, max_abs=5.0, mean_rel=0.14090682566165924, max_rel=800.2783813476562, norm_rel=0.023380892351269722, ref_abs_avg=21.1358585357666, test_abs_avg=21.136211395263672
production_forward grad[82] vs paper_forward: mean_abs=0.44720470905303955, max_abs=3.75, mean_rel=0.2287541627883911, max_rel=1968.7498779296875, norm_rel=0.021772772073745728, ref_abs_avg=20.575767517089844, test_abs_avg=20.57564926147461
production_forward grad[83] vs paper_forward: mean_abs=0.33385777473449707, max_abs=1.375, mean_rel=0.0700274407863617, max_rel=2.7372262477874756, norm_rel=0.019712474197149277, ref_abs_avg=17.041610717773438, test_abs_avg=17.026565551757812
production_forward grad[84] vs paper_forward: mean_abs=0.4614180028438568, max_abs=4.25, mean_rel=0.1401917189359665, max_rel=680.0582275390625, norm_rel=0.023030007258057594, ref_abs_avg=20.124649047851562, test_abs_avg=20.125041961669922
production_forward grad[85] vs paper_forward: mean_abs=0.4206022024154663, max_abs=4.0, mean_rel=0.17964306473731995, max_rel=1374.9998779296875, norm_rel=0.021027822047472, ref_abs_avg=20.1032657623291, test_abs_avg=20.102519989013672
production_forward grad[86] vs paper_forward: mean_abs=0.3278985023498535, max_abs=1.5625, mean_rel=0.17684265971183777, max_rel=36.745445251464844, norm_rel=0.019335957244038582, ref_abs_avg=16.717979431152344, test_abs_avg=16.704288482666016
production_forward grad[87] vs paper_forward: mean_abs=0.43260106444358826, max_abs=4.75, mean_rel=0.1442503035068512, max_rel=1479.7857666015625, norm_rel=0.022314952686429024, ref_abs_avg=19.494869232177734, test_abs_avg=19.49595832824707
production_forward grad[88] vs paper_forward: mean_abs=0.3890379071235657, max_abs=3.625, mean_rel=0.18372918665409088, max_rel=1250.0, norm_rel=0.020519398152828217, ref_abs_avg=19.080242156982422, test_abs_avg=19.086559295654297
production_forward grad[89] vs paper_forward: mean_abs=0.31422853469848633, max_abs=1.125, mean_rel=0.07113969326019287, max_rel=3.315002679824829, norm_rel=0.019842514768242836, ref_abs_avg=15.506451606750488, test_abs_avg=15.523185729980469
production_forward grad[90] vs paper_forward: mean_abs=0.41116324067115784, max_abs=4.5, mean_rel=0.1322895884513855, max_rel=526.280517578125, norm_rel=0.021898765116930008, ref_abs_avg=18.93517303466797, test_abs_avg=18.93642807006836
production_forward grad[91] vs paper_forward: mean_abs=0.3659934401512146, max_abs=4.0, mean_rel=0.1876693069934845, max_rel=2187.5, norm_rel=0.019648950546979904, ref_abs_avg=18.755334854125977, test_abs_avg=18.76229476928711
production_forward grad[92] vs paper_forward: mean_abs=0.2866753339767456, max_abs=1.25, mean_rel=0.07143682986497879, max_rel=3.6577136516571045, norm_rel=0.019425395876169205, ref_abs_avg=15.375432014465332, test_abs_avg=15.41140365600586
production_forward grad[93] vs paper_forward: mean_abs=0.3855721652507782, max_abs=5.0, mean_rel=0.12654301524162292, max_rel=745.4207763671875, norm_rel=0.02152019366621971, ref_abs_avg=18.162755966186523, test_abs_avg=18.16251564025879
production_forward grad[94] vs paper_forward: mean_abs=0.3565310835838318, max_abs=3.75, mean_rel=0.1711653620004654, max_rel=945.3124389648438, norm_rel=0.020335860550403595, ref_abs_avg=17.86468505859375, test_abs_avg=17.867502212524414
production_forward grad[95] vs paper_forward: mean_abs=0.3032264709472656, max_abs=1.375, mean_rel=0.058231666684150696, max_rel=1.4242360591888428, norm_rel=0.019931627437472343, ref_abs_avg=15.081235885620117, test_abs_avg=15.061223983764648
production_forward grad[96] vs paper_forward: mean_abs=0.3705568313598633, max_abs=4.5, mean_rel=0.12291060388088226, max_rel=476.0748596191406, norm_rel=0.020971911028027534, ref_abs_avg=17.938610076904297, test_abs_avg=17.939157485961914
production_forward grad[97] vs paper_forward: mean_abs=0.32232263684272766, max_abs=4.5, mean_rel=0.15603779256343842, max_rel=999.9999389648438, norm_rel=0.018378274515271187, ref_abs_avg=17.811050415039062, test_abs_avg=17.809986114501953
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016356657724827528, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008714242838323116, max_abs=0.615234375, mean_rel=0.075154609978199, max_rel=104.2652587890625, norm_rel=0.020548908039927483, ref_abs_avg=0.4571346640586853, test_abs_avg=0.45713210105895996
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.3654093742370605, max_abs=48.0, mean_rel=0.13121512532234192, max_rel=168.46865844726562, norm_rel=0.021052315831184387, ref_abs_avg=316.1289367675781, test_abs_avg=316.1097717285156
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.260469675064087, max_abs=4.5, mean_rel=0.19754686951637268, max_rel=52.443695068359375, norm_rel=0.023218365386128426, ref_abs_avg=55.29058837890625, test_abs_avg=55.24981689453125
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6560982465744019, max_abs=12.0, mean_rel=0.17967090010643005, max_rel=2266.73486328125, norm_rel=0.02525843121111393, ref_abs_avg=65.97898864746094, test_abs_avg=65.97491455078125
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5483962297439575, max_abs=10.0, mean_rel=0.43946459889411926, max_rel=5187.49951171875, norm_rel=0.023832816630601883, ref_abs_avg=65.19099426269531, test_abs_avg=65.19239807128906
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1590185165405273, max_abs=5.0, mean_rel=0.07347054034471512, max_rel=3.7389447689056396, norm_rel=0.02340450882911682, ref_abs_avg=49.36550521850586, test_abs_avg=49.460628509521484
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4583253860473633, max_abs=10.0, mean_rel=0.1713230311870575, max_rel=1277.640625, norm_rel=0.02498365007340908, ref_abs_avg=58.7695198059082, test_abs_avg=58.76799011230469
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3547189235687256, max_abs=8.5, mean_rel=0.3763529658317566, max_rel=3874.999755859375, norm_rel=0.023335622623562813, ref_abs_avg=58.43223571777344, test_abs_avg=58.42888641357422
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0962705612182617, max_abs=4.25, mean_rel=0.1086805909872055, max_rel=8.070680618286133, norm_rel=0.024633219465613365, ref_abs_avg=44.95466613769531, test_abs_avg=44.88829040527344
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3345532417297363, max_abs=8.75, mean_rel=0.17148947715759277, max_rel=2790.21484375, norm_rel=0.02477947808802128, ref_abs_avg=54.13782501220703, test_abs_avg=54.13686752319336
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2344340085983276, max_abs=7.125, mean_rel=0.33610838651657104, max_rel=3874.999755859375, norm_rel=0.02313203737139702, ref_abs_avg=53.61656951904297, test_abs_avg=53.61708450317383
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9188380241394043, max_abs=3.75, mean_rel=0.08519011735916138, max_rel=5.120362758636475, norm_rel=0.02235722355544567, ref_abs_avg=41.002601623535156, test_abs_avg=40.97972869873047
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2219209671020508, max_abs=8.5, mean_rel=0.15516261756420135, max_rel=1547.396240234375, norm_rel=0.02459012344479561, ref_abs_avg=49.96805953979492, test_abs_avg=49.96744155883789
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1333119869232178, max_abs=6.5, mean_rel=0.3606302738189697, max_rel=3249.999755859375, norm_rel=0.02301948145031929, ref_abs_avg=49.43346405029297, test_abs_avg=49.431732177734375
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8683021068572998, max_abs=3.5, mean_rel=0.06377602368593216, max_rel=1.9444535970687866, norm_rel=0.020973490551114082, ref_abs_avg=41.897430419921875, test_abs_avg=41.91352844238281
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1382890939712524, max_abs=8.0, mean_rel=0.17342105507850647, max_rel=1156.8787841796875, norm_rel=0.024327516555786133, ref_abs_avg=47.058204650878906, test_abs_avg=47.05906677246094
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0499204397201538, max_abs=6.125, mean_rel=0.34430551528930664, max_rel=3812.499755859375, norm_rel=0.02267778106033802, ref_abs_avg=46.60114288330078, test_abs_avg=46.606666564941406
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8984168767929077, max_abs=3.25, mean_rel=0.08817070722579956, max_rel=8.188654899597168, norm_rel=0.024747254326939583, ref_abs_avg=35.42210006713867, test_abs_avg=35.57251739501953
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0715316534042358, max_abs=8.0, mean_rel=0.1649690568447113, max_rel=2084.684326171875, norm_rel=0.02419871650636196, ref_abs_avg=44.58131408691406, test_abs_avg=44.581382751464844
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9849607944488525, max_abs=5.75, mean_rel=0.34685003757476807, max_rel=3062.499755859375, norm_rel=0.022522607818245888, ref_abs_avg=43.953468322753906, test_abs_avg=43.95487976074219
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7771282196044922, max_abs=4.25, mean_rel=0.13125453889369965, max_rel=8.050291061401367, norm_rel=0.022857872769236565, ref_abs_avg=33.45268630981445, test_abs_avg=33.470603942871094
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0134235620498657, max_abs=9.0, mean_rel=0.15878447890281677, max_rel=1322.843017578125, norm_rel=0.024053262546658516, ref_abs_avg=42.37987518310547, test_abs_avg=42.37639617919922
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9331857562065125, max_abs=6.25, mean_rel=0.2643422782421112, max_rel=2343.75, norm_rel=0.022270290181040764, ref_abs_avg=42.093833923339844, test_abs_avg=42.09333801269531
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7744312286376953, max_abs=3.75, mean_rel=0.057787470519542694, max_rel=2.235968828201294, norm_rel=0.022997764870524406, ref_abs_avg=34.67110824584961, test_abs_avg=34.67571258544922
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9649473428726196, max_abs=6.0, mean_rel=0.15465417504310608, max_rel=1894.759521484375, norm_rel=0.023851986974477768, ref_abs_avg=40.68248748779297, test_abs_avg=40.68400955200195
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8870656490325928, max_abs=5.5, mean_rel=0.3226543068885803, max_rel=3499.999755859375, norm_rel=0.022250136360526085, ref_abs_avg=40.059486389160156, test_abs_avg=40.060829162597656
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.836036205291748, max_abs=3.5, mean_rel=0.1310352087020874, max_rel=17.336503982543945, norm_rel=0.023715320974588394, ref_abs_avg=35.759674072265625, test_abs_avg=35.750030517578125
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1077160835266113, max_abs=10.0, mean_rel=0.18249627947807312, max_rel=2403.11279296875, norm_rel=0.025927549228072166, ref_abs_avg=42.96768569946289, test_abs_avg=42.969398498535156
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0321946144104004, max_abs=6.5, mean_rel=0.3764997124671936, max_rel=4218.75, norm_rel=0.02422354556620121, ref_abs_avg=42.75102233886719, test_abs_avg=42.768428802490234
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.778130292892456, max_abs=3.0, mean_rel=0.11867326498031616, max_rel=6.835395336151123, norm_rel=0.02464069053530693, ref_abs_avg=31.591154098510742, test_abs_avg=31.645915985107422
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.016202688217163, max_abs=8.0, mean_rel=0.1638350486755371, max_rel=966.6561889648438, norm_rel=0.026012027636170387, ref_abs_avg=39.21950912475586, test_abs_avg=39.220703125
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9483804702758789, max_abs=5.5, mean_rel=0.30177557468414307, max_rel=3531.249755859375, norm_rel=0.024660497903823853, ref_abs_avg=38.627723693847656, test_abs_avg=38.61933135986328
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7476291656494141, max_abs=3.0, mean_rel=0.08399900048971176, max_rel=3.736046314239502, norm_rel=0.024563826620578766, ref_abs_avg=30.683849334716797, test_abs_avg=30.65529441833496
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9445730447769165, max_abs=6.5, mean_rel=0.1655716896057129, max_rel=1012.3588256835938, norm_rel=0.02577989548444748, ref_abs_avg=36.809783935546875, test_abs_avg=36.80968475341797
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.881650447845459, max_abs=5.8125, mean_rel=0.2819024920463562, max_rel=2125.0, norm_rel=0.024295050650835037, ref_abs_avg=36.37580108642578, test_abs_avg=36.380859375
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6733349561691284, max_abs=2.5, mean_rel=0.19360774755477905, max_rel=32.525177001953125, norm_rel=0.023577436804771423, ref_abs_avg=28.662532806396484, test_abs_avg=28.626983642578125
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8953845500946045, max_abs=6.0, mean_rel=0.1658073514699936, max_rel=989.9470825195312, norm_rel=0.02572501264512539, ref_abs_avg=34.98015594482422, test_abs_avg=34.98317337036133
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8256315588951111, max_abs=5.0, mean_rel=0.31695419549942017, max_rel=2624.999755859375, norm_rel=0.024004600942134857, ref_abs_avg=34.52879333496094, test_abs_avg=34.523597717285156
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6291403770446777, max_abs=2.75, mean_rel=0.1775132417678833, max_rel=24.44624900817871, norm_rel=0.023337731137871742, ref_abs_avg=26.92026138305664, test_abs_avg=26.965084075927734
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8386172652244568, max_abs=6.0, mean_rel=0.1567194163799286, max_rel=1390.424560546875, norm_rel=0.025155136361718178, ref_abs_avg=33.438331604003906, test_abs_avg=33.43801498413086
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7789522409439087, max_abs=5.671875, mean_rel=0.2462860345840454, max_rel=2093.75, norm_rel=0.02386569045484066, ref_abs_avg=32.76579284667969, test_abs_avg=32.77301788330078
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6044578552246094, max_abs=2.1875, mean_rel=0.07517737150192261, max_rel=6.1846418380737305, norm_rel=0.021932968869805336, ref_abs_avg=27.9495849609375, test_abs_avg=27.98135757446289
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7894772291183472, max_abs=6.0, mean_rel=0.17271003127098083, max_rel=1280.774169921875, norm_rel=0.025165140628814697, ref_abs_avg=31.503814697265625, test_abs_avg=31.503273010253906
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7320327758789062, max_abs=4.375, mean_rel=0.2666771709918976, max_rel=2421.875, norm_rel=0.023554474115371704, ref_abs_avg=31.1804256439209, test_abs_avg=31.186742782592773
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.6004489064216614, max_abs=2.28125, mean_rel=0.11215297132730484, max_rel=9.783318519592285, norm_rel=0.02340891771018505, ref_abs_avg=25.768295288085938, test_abs_avg=25.755157470703125
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7513902187347412, max_abs=5.5, mean_rel=0.15563899278640747, max_rel=688.88623046875, norm_rel=0.02473980374634266, ref_abs_avg=30.45500946044922, test_abs_avg=30.458148956298828
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7002387046813965, max_abs=4.75, mean_rel=0.23757979273796082, max_rel=1843.7498779296875, norm_rel=0.023592989891767502, ref_abs_avg=29.782922744750977, test_abs_avg=29.78606414794922
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.590324878692627, max_abs=2.5, mean_rel=0.15887561440467834, max_rel=12.160141944885254, norm_rel=0.02479945681989193, ref_abs_avg=23.543964385986328, test_abs_avg=23.560749053955078
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7211304306983948, max_abs=6.25, mean_rel=0.16514623165130615, max_rel=1149.24169921875, norm_rel=0.024400589987635612, ref_abs_avg=29.640274047851562, test_abs_avg=29.641721725463867
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6690016984939575, max_abs=4.03125, mean_rel=0.2867008149623871, max_rel=2749.999755859375, norm_rel=0.0229896679520607, ref_abs_avg=29.134721755981445, test_abs_avg=29.13280487060547
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.629364013671875, max_abs=2.25, mean_rel=0.13211236894130707, max_rel=23.494930267333984, norm_rel=0.024992981925606728, ref_abs_avg=25.504371643066406, test_abs_avg=25.567020416259766
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7942795753479004, max_abs=5.5, mean_rel=0.17118790745735168, max_rel=1853.5416259765625, norm_rel=0.025884326547384262, ref_abs_avg=30.717647552490234, test_abs_avg=30.71768569946289
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7455384731292725, max_abs=5.0, mean_rel=0.29065781831741333, max_rel=2140.625, norm_rel=0.02465289644896984, ref_abs_avg=30.348173141479492, test_abs_avg=30.35594940185547
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5467720031738281, max_abs=2.25, mean_rel=0.10792750865221024, max_rel=11.562579154968262, norm_rel=0.024299561977386475, ref_abs_avg=23.157258987426758, test_abs_avg=23.163631439208984
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7295535206794739, max_abs=5.0, mean_rel=0.1598900854587555, max_rel=808.87255859375, norm_rel=0.025608297437429428, ref_abs_avg=28.569644927978516, test_abs_avg=28.5705623626709
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6803716421127319, max_abs=4.5, mean_rel=0.267281174659729, max_rel=2749.999755859375, norm_rel=0.024273334071040154, ref_abs_avg=28.11749267578125, test_abs_avg=28.119096755981445
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5926122665405273, max_abs=2.1875, mean_rel=0.09962129592895508, max_rel=6.175029277801514, norm_rel=0.02646588906645775, ref_abs_avg=22.331819534301758, test_abs_avg=22.336257934570312
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.676796555519104, max_abs=5.0, mean_rel=0.15687766671180725, max_rel=1492.820068359375, norm_rel=0.025073226541280746, ref_abs_avg=27.045833587646484, test_abs_avg=27.048564910888672
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6282902956008911, max_abs=4.0, mean_rel=0.26023855805397034, max_rel=2437.5, norm_rel=0.023559700697660446, ref_abs_avg=26.65519905090332, test_abs_avg=26.660968780517578
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4596085548400879, max_abs=1.75, mean_rel=0.06636132299900055, max_rel=5.161687850952148, norm_rel=0.02074022963643074, ref_abs_avg=22.538625717163086, test_abs_avg=22.525371551513672
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6336685419082642, max_abs=5.0, mean_rel=0.168148010969162, max_rel=1234.576416015625, norm_rel=0.024607544764876366, ref_abs_avg=25.78514862060547, test_abs_avg=25.785255432128906
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5865919589996338, max_abs=4.75, mean_rel=0.24522492289543152, max_rel=1937.4998779296875, norm_rel=0.02320079132914543, ref_abs_avg=25.269500732421875, test_abs_avg=25.269174575805664
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.47376060485839844, max_abs=1.875, mean_rel=0.07953853905200958, max_rel=2.612102746963501, norm_rel=0.024276072159409523, ref_abs_avg=19.84964942932129, test_abs_avg=19.90054702758789
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5965679287910461, max_abs=4.5, mean_rel=0.15027298033237457, max_rel=661.9041137695312, norm_rel=0.024211592972278595, ref_abs_avg=24.661380767822266, test_abs_avg=24.661577224731445
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5519084930419922, max_abs=4.0, mean_rel=0.2095799595117569, max_rel=1593.7498779296875, norm_rel=0.022665483877062798, ref_abs_avg=24.352123260498047, test_abs_avg=24.348064422607422
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4360237121582031, max_abs=1.75, mean_rel=0.07148059457540512, max_rel=2.312260627746582, norm_rel=0.02332821860909462, ref_abs_avg=18.710163116455078, test_abs_avg=18.679874420166016
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5715194344520569, max_abs=4.5, mean_rel=0.15255415439605713, max_rel=937.13623046875, norm_rel=0.023752402514219284, ref_abs_avg=24.070829391479492, test_abs_avg=24.072071075439453
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5276952385902405, max_abs=4.0, mean_rel=0.20172488689422607, max_rel=1593.7498779296875, norm_rel=0.02260354906320572, ref_abs_avg=23.38620376586914, test_abs_avg=23.38617706298828
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4112071990966797, max_abs=1.75, mean_rel=0.14190155267715454, max_rel=20.250242233276367, norm_rel=0.020852524787187576, ref_abs_avg=19.657333374023438, test_abs_avg=19.685014724731445
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5344825387001038, max_abs=4.0, mean_rel=0.1514051854610443, max_rel=938.02978515625, norm_rel=0.023524101823568344, ref_abs_avg=22.7923583984375, test_abs_avg=22.79294776916504
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.49788978695869446, max_abs=3.0, mean_rel=0.2594379484653473, max_rel=1531.2498779296875, norm_rel=0.022006670013070107, ref_abs_avg=22.691104888916016, test_abs_avg=22.69304656982422
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.42070913314819336, max_abs=1.875, mean_rel=0.09362979233264923, max_rel=9.823515892028809, norm_rel=0.02329966239631176, ref_abs_avg=18.563365936279297, test_abs_avg=18.540573120117188
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5198583602905273, max_abs=4.5, mean_rel=0.14815229177474976, max_rel=794.6265258789062, norm_rel=0.023164397105574608, ref_abs_avg=22.468917846679688, test_abs_avg=22.468475341796875
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.48586344718933105, max_abs=4.0, mean_rel=0.2174045443534851, max_rel=1312.4998779296875, norm_rel=0.02183697372674942, ref_abs_avg=22.269960403442383, test_abs_avg=22.280977249145508
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4484821557998657, max_abs=1.875, mean_rel=0.12507179379463196, max_rel=17.33913803100586, norm_rel=0.025150008499622345, ref_abs_avg=18.00374984741211, test_abs_avg=17.954082489013672
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5754395127296448, max_abs=4.5, mean_rel=0.16117897629737854, max_rel=994.8350830078125, norm_rel=0.024507666006684303, ref_abs_avg=23.518569946289062, test_abs_avg=23.517234802246094
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5353747010231018, max_abs=5.0, mean_rel=0.2263338416814804, max_rel=1843.7498779296875, norm_rel=0.023500435054302216, ref_abs_avg=22.891944885253906, test_abs_avg=22.89092254638672
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.40126585960388184, max_abs=1.5625, mean_rel=0.12167544662952423, max_rel=13.499299049377441, norm_rel=0.02309643104672432, ref_abs_avg=17.222578048706055, test_abs_avg=17.217336654663086
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5220412015914917, max_abs=5.0, mean_rel=0.15457335114479065, max_rel=791.1825561523438, norm_rel=0.024265309795737267, ref_abs_avg=21.525249481201172, test_abs_avg=21.52455711364746
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4906322658061981, max_abs=3.875, mean_rel=0.2058217078447342, max_rel=2125.0, norm_rel=0.02282104827463627, ref_abs_avg=21.586936950683594, test_abs_avg=21.58553695678711
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3720541000366211, max_abs=1.25, mean_rel=0.08590491116046906, max_rel=7.064377784729004, norm_rel=0.02098177745938301, ref_abs_avg=17.617557525634766, test_abs_avg=17.592510223388672
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.49737581610679626, max_abs=4.0, mean_rel=0.14282314479351044, max_rel=1305.5584716796875, norm_rel=0.023581285029649734, ref_abs_avg=21.1358585357666, test_abs_avg=21.135263442993164
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4520179033279419, max_abs=3.5, mean_rel=0.22496297955513, max_rel=1999.9998779296875, norm_rel=0.022044144570827484, ref_abs_avg=20.575767517089844, test_abs_avg=20.57438850402832
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3434257507324219, max_abs=1.25, mean_rel=0.07575103640556335, max_rel=4.13007116317749, norm_rel=0.019897909834980965, ref_abs_avg=17.041610717773438, test_abs_avg=17.025768280029297
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4650421738624573, max_abs=4.5, mean_rel=0.14143899083137512, max_rel=646.0388793945312, norm_rel=0.02319476753473282, ref_abs_avg=20.124649047851562, test_abs_avg=20.124462127685547
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.426725834608078, max_abs=4.0, mean_rel=0.17511330544948578, max_rel=1406.2498779296875, norm_rel=0.021315963938832283, ref_abs_avg=20.1032657623291, test_abs_avg=20.099655151367188
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.32433414459228516, max_abs=1.5, mean_rel=0.16821789741516113, max_rel=30.077985763549805, norm_rel=0.019391357898712158, ref_abs_avg=16.717979431152344, test_abs_avg=16.70746612548828
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.43578580021858215, max_abs=4.5, mean_rel=0.1430247277021408, max_rel=839.8114624023438, norm_rel=0.022466808557510376, ref_abs_avg=19.494869232177734, test_abs_avg=19.4950008392334
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.39140766859054565, max_abs=3.75, mean_rel=0.1880822628736496, max_rel=1148.4375, norm_rel=0.020622538402676582, ref_abs_avg=19.080242156982422, test_abs_avg=19.08562660217285
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3191201686859131, max_abs=1.125, mean_rel=0.08225926756858826, max_rel=4.160120010375977, norm_rel=0.020143860951066017, ref_abs_avg=15.506451606750488, test_abs_avg=15.519513130187988
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.41342443227767944, max_abs=5.0, mean_rel=0.13268835842609406, max_rel=501.4780578613281, norm_rel=0.022012809291481972, ref_abs_avg=18.93517303466797, test_abs_avg=18.935598373413086
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.36815810203552246, max_abs=4.0, mean_rel=0.19772548973560333, max_rel=3171.874755859375, norm_rel=0.0197971910238266, ref_abs_avg=18.755334854125977, test_abs_avg=18.762290954589844
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.29976749420166016, max_abs=1.1875, mean_rel=0.06494336575269699, max_rel=2.955892562866211, norm_rel=0.019727827981114388, ref_abs_avg=15.375432014465332, test_abs_avg=15.398670196533203
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3874800503253937, max_abs=5.0, mean_rel=0.1281326860189438, max_rel=731.4896850585938, norm_rel=0.021604642271995544, ref_abs_avg=18.162755966186523, test_abs_avg=18.162433624267578
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3572043776512146, max_abs=3.25, mean_rel=0.16779550909996033, max_rel=1062.5, norm_rel=0.020286129787564278, ref_abs_avg=17.86468505859375, test_abs_avg=17.871204376220703
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.29147815704345703, max_abs=1.125, mean_rel=0.05892101675271988, max_rel=1.7990961074829102, norm_rel=0.019530294463038445, ref_abs_avg=15.081235885620117, test_abs_avg=15.075974464416504
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3709295988082886, max_abs=4.5, mean_rel=0.1228572279214859, max_rel=393.6830749511719, norm_rel=0.02101237326860428, ref_abs_avg=17.938610076904297, test_abs_avg=17.938982009887695
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3224385976791382, max_abs=4.0, mean_rel=0.1585211455821991, max_rel=1085.9375, norm_rel=0.018345680087804794, ref_abs_avg=17.811050415039062, test_abs_avg=17.810321807861328
production_forward2 vs paper_forward output: mean_abs=0.0016313892556354403, max_abs=0.03515625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008360357023775578, max_abs=0.576171875, mean_rel=0.07242920249700546, max_rel=90.20568084716797, norm_rel=0.01983179710805416, ref_abs_avg=0.4571346640586853, test_abs_avg=0.45715320110321045
production_forward2 grad[1] vs paper_forward: mean_abs=7.229907035827637, max_abs=56.0, mean_rel=0.1238960400223732, max_rel=91.24006652832031, norm_rel=0.020703710615634918, ref_abs_avg=316.1289367675781, test_abs_avg=316.17559814453125
production_forward2 grad[2] vs paper_forward: mean_abs=1.271251916885376, max_abs=4.5, mean_rel=0.19192150235176086, max_rel=55.36253356933594, norm_rel=0.02345539815723896, ref_abs_avg=55.29058837890625, test_abs_avg=55.26341247558594
production_forward2 grad[3] vs paper_forward: mean_abs=1.6021825075149536, max_abs=10.5, mean_rel=0.1746152639389038, max_rel=2706.880859375, norm_rel=0.024435197934508324, ref_abs_avg=65.97898864746094, test_abs_avg=65.97982788085938
production_forward2 grad[4] vs paper_forward: mean_abs=1.4882960319519043, max_abs=8.125, mean_rel=0.39836597442626953, max_rel=4437.5, norm_rel=0.022948555648326874, ref_abs_avg=65.19099426269531, test_abs_avg=65.19401550292969
production_forward2 grad[5] vs paper_forward: mean_abs=1.113083839416504, max_abs=5.0, mean_rel=0.07292140275239944, max_rel=4.964556694030762, norm_rel=0.022231370210647583, ref_abs_avg=49.36550521850586, test_abs_avg=49.39129638671875
production_forward2 grad[6] vs paper_forward: mean_abs=1.414414405822754, max_abs=9.5, mean_rel=0.16736219823360443, max_rel=1091.5162353515625, norm_rel=0.02423880062997341, ref_abs_avg=58.7695198059082, test_abs_avg=58.771541595458984
production_forward2 grad[7] vs paper_forward: mean_abs=1.3085455894470215, max_abs=8.25, mean_rel=0.3503936529159546, max_rel=4500.0, norm_rel=0.022532889619469643, ref_abs_avg=58.43223571777344, test_abs_avg=58.43103790283203
production_forward2 grad[8] vs paper_forward: mean_abs=1.0581213235855103, max_abs=4.5, mean_rel=0.12304408848285675, max_rel=23.947429656982422, norm_rel=0.02359938621520996, ref_abs_avg=44.95466613769531, test_abs_avg=44.87084197998047
production_forward2 grad[9] vs paper_forward: mean_abs=1.2962498664855957, max_abs=8.625, mean_rel=0.16615428030490875, max_rel=2363.30029296875, norm_rel=0.024090245366096497, ref_abs_avg=54.13782501220703, test_abs_avg=54.139892578125
production_forward2 grad[10] vs paper_forward: mean_abs=1.194239616394043, max_abs=8.0, mean_rel=0.32466110587120056, max_rel=3249.999755859375, norm_rel=0.022386258468031883, ref_abs_avg=53.61656951904297, test_abs_avg=53.62010192871094
production_forward2 grad[11] vs paper_forward: mean_abs=0.9371986389160156, max_abs=4.0, mean_rel=0.08519857376813889, max_rel=3.461043357849121, norm_rel=0.022589463740587234, ref_abs_avg=41.002601623535156, test_abs_avg=40.97568130493164
production_forward2 grad[12] vs paper_forward: mean_abs=1.1878052949905396, max_abs=8.0, mean_rel=0.15416193008422852, max_rel=1843.07666015625, norm_rel=0.023940851911902428, ref_abs_avg=49.96805953979492, test_abs_avg=49.971920013427734
production_forward2 grad[13] vs paper_forward: mean_abs=1.099637746810913, max_abs=7.0, mean_rel=0.3604939579963684, max_rel=3437.499755859375, norm_rel=0.022371385246515274, ref_abs_avg=49.43346405029297, test_abs_avg=49.433509826660156
production_forward2 grad[14] vs paper_forward: mean_abs=0.8219413757324219, max_abs=3.0, mean_rel=0.06527645885944366, max_rel=3.405165195465088, norm_rel=0.020168354734778404, ref_abs_avg=41.897430419921875, test_abs_avg=41.918094635009766
production_forward2 grad[15] vs paper_forward: mean_abs=1.1073874235153198, max_abs=9.0, mean_rel=0.16940081119537354, max_rel=1484.398193359375, norm_rel=0.023688437417149544, ref_abs_avg=47.058204650878906, test_abs_avg=47.062538146972656
production_forward2 grad[16] vs paper_forward: mean_abs=1.0171289443969727, max_abs=6.5, mean_rel=0.3558659851551056, max_rel=4031.249755859375, norm_rel=0.021962113678455353, ref_abs_avg=46.60114288330078, test_abs_avg=46.609622955322266
production_forward2 grad[17] vs paper_forward: mean_abs=0.8068246841430664, max_abs=3.125, mean_rel=0.08218354731798172, max_rel=6.825982093811035, norm_rel=0.022847164422273636, ref_abs_avg=35.42210006713867, test_abs_avg=35.54997634887695
production_forward2 grad[18] vs paper_forward: mean_abs=1.044722557067871, max_abs=10.0, mean_rel=0.16429197788238525, max_rel=1943.1431884765625, norm_rel=0.023598289117217064, ref_abs_avg=44.58131408691406, test_abs_avg=44.584007263183594
production_forward2 grad[19] vs paper_forward: mean_abs=0.9560554623603821, max_abs=6.0, mean_rel=0.3328208029270172, max_rel=3187.499755859375, norm_rel=0.021891599521040916, ref_abs_avg=43.953468322753906, test_abs_avg=43.957366943359375
production_forward2 grad[20] vs paper_forward: mean_abs=0.7855392694473267, max_abs=3.75, mean_rel=0.13930097222328186, max_rel=10.289754867553711, norm_rel=0.02296854369342327, ref_abs_avg=33.45268630981445, test_abs_avg=33.477516174316406
production_forward2 grad[21] vs paper_forward: mean_abs=0.9917170405387878, max_abs=8.0, mean_rel=0.15550044178962708, max_rel=1928.2315673828125, norm_rel=0.023535335436463356, ref_abs_avg=42.37987518310547, test_abs_avg=42.378623962402344
production_forward2 grad[22] vs paper_forward: mean_abs=0.906889796257019, max_abs=5.5, mean_rel=0.2681710720062256, max_rel=2343.75, norm_rel=0.021659046411514282, ref_abs_avg=42.093833923339844, test_abs_avg=42.095428466796875
production_forward2 grad[23] vs paper_forward: mean_abs=0.7212333679199219, max_abs=3.375, mean_rel=0.06501206755638123, max_rel=6.47716760635376, norm_rel=0.021727025508880615, ref_abs_avg=34.67110824584961, test_abs_avg=34.661720275878906
production_forward2 grad[24] vs paper_forward: mean_abs=0.9446815848350525, max_abs=7.0, mean_rel=0.1549738496541977, max_rel=1752.0164794921875, norm_rel=0.023368095979094505, ref_abs_avg=40.68248748779297, test_abs_avg=40.68642044067383
production_forward2 grad[25] vs paper_forward: mean_abs=0.8639928102493286, max_abs=5.125, mean_rel=0.2965167164802551, max_rel=3343.749755859375, norm_rel=0.021672679111361504, ref_abs_avg=40.059486389160156, test_abs_avg=40.06260681152344
production_forward2 grad[26] vs paper_forward: mean_abs=0.8376836776733398, max_abs=3.875, mean_rel=0.1342201828956604, max_rel=24.726665496826172, norm_rel=0.02393110655248165, ref_abs_avg=35.759674072265625, test_abs_avg=35.719032287597656
production_forward2 grad[27] vs paper_forward: mean_abs=1.0806834697723389, max_abs=7.0, mean_rel=0.17650535702705383, max_rel=1617.5684814453125, norm_rel=0.025285694748163223, ref_abs_avg=42.96768569946289, test_abs_avg=42.972869873046875
production_forward2 grad[28] vs paper_forward: mean_abs=1.0053942203521729, max_abs=6.5, mean_rel=0.36379680037498474, max_rel=2937.499755859375, norm_rel=0.023597730323672295, ref_abs_avg=42.75102233886719, test_abs_avg=42.76754379272461
production_forward2 grad[29] vs paper_forward: mean_abs=0.7820663452148438, max_abs=3.0, mean_rel=0.124895378947258, max_rel=10.393881797790527, norm_rel=0.02434229850769043, ref_abs_avg=31.591154098510742, test_abs_avg=31.673290252685547
production_forward2 grad[30] vs paper_forward: mean_abs=0.993965208530426, max_abs=7.25, mean_rel=0.16599109768867493, max_rel=1610.7845458984375, norm_rel=0.025456368923187256, ref_abs_avg=39.21950912475586, test_abs_avg=39.22383499145508
production_forward2 grad[31] vs paper_forward: mean_abs=0.9272017478942871, max_abs=6.0, mean_rel=0.28127408027648926, max_rel=3187.499755859375, norm_rel=0.02409372851252556, ref_abs_avg=38.627723693847656, test_abs_avg=38.621307373046875
production_forward2 grad[32] vs paper_forward: mean_abs=0.7309741973876953, max_abs=3.5, mean_rel=0.07352983951568604, max_rel=3.8389322757720947, norm_rel=0.02405511774122715, ref_abs_avg=30.683849334716797, test_abs_avg=30.616159439086914
production_forward2 grad[33] vs paper_forward: mean_abs=0.9268850088119507, max_abs=7.0, mean_rel=0.15828251838684082, max_rel=904.5311279296875, norm_rel=0.025316692888736725, ref_abs_avg=36.809783935546875, test_abs_avg=36.811805725097656
production_forward2 grad[34] vs paper_forward: mean_abs=0.8647952079772949, max_abs=6.0, mean_rel=0.28222236037254333, max_rel=2624.999755859375, norm_rel=0.0238359235227108, ref_abs_avg=36.37580108642578, test_abs_avg=36.382869720458984
production_forward2 grad[35] vs paper_forward: mean_abs=0.6727193593978882, max_abs=2.875, mean_rel=0.1874871402978897, max_rel=22.69371795654297, norm_rel=0.023911457508802414, ref_abs_avg=28.662532806396484, test_abs_avg=28.64640998840332
production_forward2 grad[36] vs paper_forward: mean_abs=0.8795458078384399, max_abs=6.0, mean_rel=0.16198988258838654, max_rel=1255.30126953125, norm_rel=0.025266895070672035, ref_abs_avg=34.98015594482422, test_abs_avg=34.98418426513672
production_forward2 grad[37] vs paper_forward: mean_abs=0.8060752153396606, max_abs=5.5, mean_rel=0.2915588617324829, max_rel=2624.999755859375, norm_rel=0.02344924584031105, ref_abs_avg=34.52879333496094, test_abs_avg=34.52411651611328
production_forward2 grad[38] vs paper_forward: mean_abs=0.6068005561828613, max_abs=2.75, mean_rel=0.18486414849758148, max_rel=35.956268310546875, norm_rel=0.022715186700224876, ref_abs_avg=26.92026138305664, test_abs_avg=26.951061248779297
production_forward2 grad[39] vs paper_forward: mean_abs=0.825076699256897, max_abs=6.5, mean_rel=0.15465039014816284, max_rel=1643.2977294921875, norm_rel=0.024767527356743813, ref_abs_avg=33.438331604003906, test_abs_avg=33.439414978027344
production_forward2 grad[40] vs paper_forward: mean_abs=0.7645418047904968, max_abs=5.71875, mean_rel=0.2379835546016693, max_rel=1843.7498779296875, norm_rel=0.023423120379447937, ref_abs_avg=32.76579284667969, test_abs_avg=32.77369689941406
production_forward2 grad[41] vs paper_forward: mean_abs=0.6067581176757812, max_abs=2.125, mean_rel=0.0675916075706482, max_rel=3.453254222869873, norm_rel=0.021817686036229134, ref_abs_avg=27.9495849609375, test_abs_avg=27.98029327392578
production_forward2 grad[42] vs paper_forward: mean_abs=0.7780647277832031, max_abs=6.0, mean_rel=0.16779474914073944, max_rel=1239.3958740234375, norm_rel=0.02479618601500988, ref_abs_avg=31.503814697265625, test_abs_avg=31.50368881225586
production_forward2 grad[43] vs paper_forward: mean_abs=0.7206192016601562, max_abs=4.5, mean_rel=0.251176655292511, max_rel=2375.0, norm_rel=0.023181216791272163, ref_abs_avg=31.1804256439209, test_abs_avg=31.189128875732422
production_forward2 grad[44] vs paper_forward: mean_abs=0.6022922992706299, max_abs=2.40625, mean_rel=0.11188706010580063, max_rel=12.884599685668945, norm_rel=0.02334008365869522, ref_abs_avg=25.768295288085938, test_abs_avg=25.755529403686523
production_forward2 grad[45] vs paper_forward: mean_abs=0.7402132749557495, max_abs=6.0, mean_rel=0.15097060799598694, max_rel=589.0875244140625, norm_rel=0.024383801966905594, ref_abs_avg=30.45500946044922, test_abs_avg=30.458669662475586
production_forward2 grad[46] vs paper_forward: mean_abs=0.6896489858627319, max_abs=4.5, mean_rel=0.2426171451807022, max_rel=1843.7498779296875, norm_rel=0.02322925068438053, ref_abs_avg=29.782922744750977, test_abs_avg=29.786386489868164
production_forward2 grad[47] vs paper_forward: mean_abs=0.5921297073364258, max_abs=2.125, mean_rel=0.16252441704273224, max_rel=16.540922164916992, norm_rel=0.024778246879577637, ref_abs_avg=23.543964385986328, test_abs_avg=23.51601791381836
production_forward2 grad[48] vs paper_forward: mean_abs=0.7126996517181396, max_abs=5.0, mean_rel=0.16143333911895752, max_rel=1399.2972412109375, norm_rel=0.024117032065987587, ref_abs_avg=29.640274047851562, test_abs_avg=29.643701553344727
production_forward2 grad[49] vs paper_forward: mean_abs=0.6576024293899536, max_abs=4.40625, mean_rel=0.2914494276046753, max_rel=2999.999755859375, norm_rel=0.022606829181313515, ref_abs_avg=29.134721755981445, test_abs_avg=29.135408401489258
production_forward2 grad[50] vs paper_forward: mean_abs=0.6236977577209473, max_abs=2.28125, mean_rel=0.14426285028457642, max_rel=22.958362579345703, norm_rel=0.02485617808997631, ref_abs_avg=25.504371643066406, test_abs_avg=25.592693328857422
production_forward2 grad[51] vs paper_forward: mean_abs=0.7824324369430542, max_abs=5.5, mean_rel=0.16814231872558594, max_rel=1526.4857177734375, norm_rel=0.025518128648400307, ref_abs_avg=30.717647552490234, test_abs_avg=30.719627380371094
production_forward2 grad[52] vs paper_forward: mean_abs=0.7363066673278809, max_abs=5.0, mean_rel=0.2832707464694977, max_rel=2406.25, norm_rel=0.0243818536400795, ref_abs_avg=30.348173141479492, test_abs_avg=30.35786247253418
production_forward2 grad[53] vs paper_forward: mean_abs=0.5503036975860596, max_abs=2.40625, mean_rel=0.13545483350753784, max_rel=11.206161499023438, norm_rel=0.023859873414039612, ref_abs_avg=23.157258987426758, test_abs_avg=23.16675567626953
production_forward2 grad[54] vs paper_forward: mean_abs=0.7188122272491455, max_abs=5.0, mean_rel=0.15282192826271057, max_rel=858.67333984375, norm_rel=0.025247154757380486, ref_abs_avg=28.569644927978516, test_abs_avg=28.570791244506836
production_forward2 grad[55] vs paper_forward: mean_abs=0.6692876219749451, max_abs=4.703125, mean_rel=0.26527345180511475, max_rel=2343.75, norm_rel=0.023871060460805893, ref_abs_avg=28.11749267578125, test_abs_avg=28.1195011138916
production_forward2 grad[56] vs paper_forward: mean_abs=0.5431421399116516, max_abs=2.5, mean_rel=0.12052010744810104, max_rel=19.008771896362305, norm_rel=0.024558329954743385, ref_abs_avg=22.331819534301758, test_abs_avg=22.358386993408203
production_forward2 grad[57] vs paper_forward: mean_abs=0.6672368049621582, max_abs=5.0, mean_rel=0.15374541282653809, max_rel=1143.6292724609375, norm_rel=0.024722924456000328, ref_abs_avg=27.045833587646484, test_abs_avg=27.049001693725586
production_forward2 grad[58] vs paper_forward: mean_abs=0.6199169158935547, max_abs=4.0, mean_rel=0.2474198192358017, max_rel=1781.2498779296875, norm_rel=0.023248059675097466, ref_abs_avg=26.65519905090332, test_abs_avg=26.6595516204834
production_forward2 grad[59] vs paper_forward: mean_abs=0.45937395095825195, max_abs=1.875, mean_rel=0.09404498338699341, max_rel=10.772417068481445, norm_rel=0.020653335377573967, ref_abs_avg=22.538625717163086, test_abs_avg=22.527690887451172
production_forward2 grad[60] vs paper_forward: mean_abs=0.6254257559776306, max_abs=4.5, mean_rel=0.16350682079792023, max_rel=1382.436279296875, norm_rel=0.02430075779557228, ref_abs_avg=25.78514862060547, test_abs_avg=25.786434173583984
production_forward2 grad[61] vs paper_forward: mean_abs=0.5771828889846802, max_abs=4.25, mean_rel=0.2229519784450531, max_rel=1468.7498779296875, norm_rel=0.022848907858133316, ref_abs_avg=25.269500732421875, test_abs_avg=25.272714614868164
production_forward2 grad[62] vs paper_forward: mean_abs=0.4709959030151367, max_abs=1.921875, mean_rel=0.08542311191558838, max_rel=3.2811179161071777, norm_rel=0.024108219891786575, ref_abs_avg=19.84964942932129, test_abs_avg=19.896900177001953
production_forward2 grad[63] vs paper_forward: mean_abs=0.5893779993057251, max_abs=4.0, mean_rel=0.14785289764404297, max_rel=886.6626586914062, norm_rel=0.023928403854370117, ref_abs_avg=24.661380767822266, test_abs_avg=24.66254997253418
production_forward2 grad[64] vs paper_forward: mean_abs=0.5445737838745117, max_abs=4.0, mean_rel=0.2083345353603363, max_rel=1960.9373779296875, norm_rel=0.0223789494484663, ref_abs_avg=24.352123260498047, test_abs_avg=24.349353790283203
production_forward2 grad[65] vs paper_forward: mean_abs=0.4510459899902344, max_abs=1.9375, mean_rel=0.0746384859085083, max_rel=2.038895845413208, norm_rel=0.024041462689638138, ref_abs_avg=18.710163116455078, test_abs_avg=18.682571411132812
production_forward2 grad[66] vs paper_forward: mean_abs=0.5652018785476685, max_abs=4.5, mean_rel=0.1479724496603012, max_rel=803.509765625, norm_rel=0.023505793884396553, ref_abs_avg=24.070829391479492, test_abs_avg=24.07195281982422
production_forward2 grad[67] vs paper_forward: mean_abs=0.5201119780540466, max_abs=4.0, mean_rel=0.20278400182724, max_rel=1468.7498779296875, norm_rel=0.022284243255853653, ref_abs_avg=23.38620376586914, test_abs_avg=23.39105796813965
production_forward2 grad[68] vs paper_forward: mean_abs=0.39992475509643555, max_abs=1.75, mean_rel=0.13022232055664062, max_rel=18.182132720947266, norm_rel=0.02046906016767025, ref_abs_avg=19.657333374023438, test_abs_avg=19.66166114807129
production_forward2 grad[69] vs paper_forward: mean_abs=0.5296616554260254, max_abs=5.0, mean_rel=0.14650721848011017, max_rel=1294.067138671875, norm_rel=0.0233091339468956, ref_abs_avg=22.7923583984375, test_abs_avg=22.793577194213867
production_forward2 grad[70] vs paper_forward: mean_abs=0.49223196506500244, max_abs=3.5, mean_rel=0.2454877495765686, max_rel=1453.1248779296875, norm_rel=0.02175818383693695, ref_abs_avg=22.691104888916016, test_abs_avg=22.69333267211914
production_forward2 grad[71] vs paper_forward: mean_abs=0.40819549560546875, max_abs=2.0, mean_rel=0.06920896470546722, max_rel=2.7646796703338623, norm_rel=0.022432349622249603, ref_abs_avg=18.563365936279297, test_abs_avg=18.537857055664062
production_forward2 grad[72] vs paper_forward: mean_abs=0.5159823298454285, max_abs=4.0, mean_rel=0.14392060041427612, max_rel=895.0335083007812, norm_rel=0.022997237741947174, ref_abs_avg=22.468917846679688, test_abs_avg=22.469959259033203
production_forward2 grad[73] vs paper_forward: mean_abs=0.479347825050354, max_abs=4.125, mean_rel=0.21302610635757446, max_rel=1765.6248779296875, norm_rel=0.021539034321904182, ref_abs_avg=22.269960403442383, test_abs_avg=22.2828311920166
production_forward2 grad[74] vs paper_forward: mean_abs=0.4391845464706421, max_abs=2.125, mean_rel=0.1849086582660675, max_rel=43.76068115234375, norm_rel=0.02485240250825882, ref_abs_avg=18.00374984741211, test_abs_avg=17.964996337890625
production_forward2 grad[75] vs paper_forward: mean_abs=0.568304717540741, max_abs=4.5, mean_rel=0.16264963150024414, max_rel=902.0363159179688, norm_rel=0.02421688660979271, ref_abs_avg=23.518569946289062, test_abs_avg=23.518871307373047
production_forward2 grad[76] vs paper_forward: mean_abs=0.5266096591949463, max_abs=3.5, mean_rel=0.2154485285282135, max_rel=1687.4998779296875, norm_rel=0.023115966469049454, ref_abs_avg=22.891944885253906, test_abs_avg=22.895641326904297
production_forward2 grad[77] vs paper_forward: mean_abs=0.3953573703765869, max_abs=1.546875, mean_rel=0.16003447771072388, max_rel=23.987743377685547, norm_rel=0.022811459377408028, ref_abs_avg=17.222578048706055, test_abs_avg=17.231029510498047
production_forward2 grad[78] vs paper_forward: mean_abs=0.5165510177612305, max_abs=6.0, mean_rel=0.15445753931999207, max_rel=784.023681640625, norm_rel=0.024031352251768112, ref_abs_avg=21.525249481201172, test_abs_avg=21.525390625
production_forward2 grad[79] vs paper_forward: mean_abs=0.48478609323501587, max_abs=4.5, mean_rel=0.1964009553194046, max_rel=1343.7498779296875, norm_rel=0.022552894428372383, ref_abs_avg=21.586936950683594, test_abs_avg=21.58814239501953
production_forward2 grad[80] vs paper_forward: mean_abs=0.3940887451171875, max_abs=1.625, mean_rel=0.08560138195753098, max_rel=5.837981700897217, norm_rel=0.022299449890851974, ref_abs_avg=17.617557525634766, test_abs_avg=17.58450698852539
production_forward2 grad[81] vs paper_forward: mean_abs=0.4928298592567444, max_abs=5.0, mean_rel=0.14090682566165924, max_rel=800.2783813476562, norm_rel=0.023380892351269722, ref_abs_avg=21.1358585357666, test_abs_avg=21.136211395263672
production_forward2 grad[82] vs paper_forward: mean_abs=0.44720470905303955, max_abs=3.75, mean_rel=0.2287541627883911, max_rel=1968.7498779296875, norm_rel=0.021772772073745728, ref_abs_avg=20.575767517089844, test_abs_avg=20.57564926147461
production_forward2 grad[83] vs paper_forward: mean_abs=0.33385777473449707, max_abs=1.375, mean_rel=0.0700274407863617, max_rel=2.7372262477874756, norm_rel=0.019712474197149277, ref_abs_avg=17.041610717773438, test_abs_avg=17.026565551757812
production_forward2 grad[84] vs paper_forward: mean_abs=0.4614180028438568, max_abs=4.25, mean_rel=0.1401917189359665, max_rel=680.0582275390625, norm_rel=0.023030007258057594, ref_abs_avg=20.124649047851562, test_abs_avg=20.125041961669922
production_forward2 grad[85] vs paper_forward: mean_abs=0.4206022024154663, max_abs=4.0, mean_rel=0.17964306473731995, max_rel=1374.9998779296875, norm_rel=0.021027822047472, ref_abs_avg=20.1032657623291, test_abs_avg=20.102519989013672
production_forward2 grad[86] vs paper_forward: mean_abs=0.3278985023498535, max_abs=1.5625, mean_rel=0.17684265971183777, max_rel=36.745445251464844, norm_rel=0.019335957244038582, ref_abs_avg=16.717979431152344, test_abs_avg=16.704288482666016
production_forward2 grad[87] vs paper_forward: mean_abs=0.43260106444358826, max_abs=4.75, mean_rel=0.1442503035068512, max_rel=1479.7857666015625, norm_rel=0.022314952686429024, ref_abs_avg=19.494869232177734, test_abs_avg=19.49595832824707
production_forward2 grad[88] vs paper_forward: mean_abs=0.3890379071235657, max_abs=3.625, mean_rel=0.18372918665409088, max_rel=1250.0, norm_rel=0.020519398152828217, ref_abs_avg=19.080242156982422, test_abs_avg=19.086559295654297
production_forward2 grad[89] vs paper_forward: mean_abs=0.31422853469848633, max_abs=1.125, mean_rel=0.07113969326019287, max_rel=3.315002679824829, norm_rel=0.019842514768242836, ref_abs_avg=15.506451606750488, test_abs_avg=15.523185729980469
production_forward2 grad[90] vs paper_forward: mean_abs=0.41116324067115784, max_abs=4.5, mean_rel=0.1322895884513855, max_rel=526.280517578125, norm_rel=0.021898765116930008, ref_abs_avg=18.93517303466797, test_abs_avg=18.93642807006836
production_forward2 grad[91] vs paper_forward: mean_abs=0.3659934401512146, max_abs=4.0, mean_rel=0.1876693069934845, max_rel=2187.5, norm_rel=0.019648950546979904, ref_abs_avg=18.755334854125977, test_abs_avg=18.76229476928711
production_forward2 grad[92] vs paper_forward: mean_abs=0.2866753339767456, max_abs=1.25, mean_rel=0.07143682986497879, max_rel=3.6577136516571045, norm_rel=0.019425395876169205, ref_abs_avg=15.375432014465332, test_abs_avg=15.41140365600586
production_forward2 grad[93] vs paper_forward: mean_abs=0.3855721652507782, max_abs=5.0, mean_rel=0.12654301524162292, max_rel=745.4207763671875, norm_rel=0.02152019366621971, ref_abs_avg=18.162755966186523, test_abs_avg=18.16251564025879
production_forward2 grad[94] vs paper_forward: mean_abs=0.3565310835838318, max_abs=3.75, mean_rel=0.1711653620004654, max_rel=945.3124389648438, norm_rel=0.020335860550403595, ref_abs_avg=17.86468505859375, test_abs_avg=17.867502212524414
production_forward2 grad[95] vs paper_forward: mean_abs=0.3032264709472656, max_abs=1.375, mean_rel=0.058231666684150696, max_rel=1.4242360591888428, norm_rel=0.019931627437472343, ref_abs_avg=15.081235885620117, test_abs_avg=15.061223983764648
production_forward2 grad[96] vs paper_forward: mean_abs=0.3705568313598633, max_abs=4.5, mean_rel=0.12291060388088226, max_rel=476.0748596191406, norm_rel=0.020971911028027534, ref_abs_avg=17.938610076904297, test_abs_avg=17.939157485961914
production_forward2 grad[97] vs paper_forward: mean_abs=0.32232263684272766, max_abs=4.5, mean_rel=0.15603779256343842, max_rel=999.9999389648438, norm_rel=0.018378274515271187, ref_abs_avg=17.811050415039062, test_abs_avg=17.809986114501953
identity layers + randn queries
production_forward2 fwd+bwd:  113.634 ms
production_forward2 bwd-only: 96.030 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.326 GiB, fwd+bwd=10.326 GiB
paper_forward fwd+bwd:  381.675 ms
paper_forward bwd-only: 301.517 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.744 GiB, fwd+bwd=32.494 GiB
production_forward fwd+bwd:  114.394 ms
production_forward bwd-only: 96.011 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.326 GiB, fwd+bwd=10.326 GiB
torch_compile_phases_forward fwd+bwd:  167.074 ms
torch_compile_phases_forward bwd-only: 132.651 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016474537551403046, max_abs=0.03515625
production_forward grad[0] vs paper_forward: mean_abs=0.008428329601883888, max_abs=0.34375, mean_rel=0.07201394438743591, max_rel=156.1593017578125, norm_rel=0.01970323920249939, ref_abs_avg=0.4651203751564026, test_abs_avg=0.465143620967865
production_forward grad[1] vs paper_forward: mean_abs=7.326257228851318, max_abs=56.0, mean_rel=0.14497505128383636, max_rel=193.55484008789062, norm_rel=0.020324572920799255, ref_abs_avg=320.385498046875, test_abs_avg=320.4141845703125
production_forward grad[2] vs paper_forward: mean_abs=1.2882776260375977, max_abs=4.75, mean_rel=0.1626555174589157, max_rel=26.58014488220215, norm_rel=0.024343719705939293, ref_abs_avg=51.683963775634766, test_abs_avg=51.628509521484375
production_forward grad[3] vs paper_forward: mean_abs=1.6319026947021484, max_abs=12.0, mean_rel=0.17638181149959564, max_rel=3421.9482421875, norm_rel=0.02443384937942028, ref_abs_avg=67.28092956542969, test_abs_avg=67.28878784179688
production_forward grad[4] vs paper_forward: mean_abs=1.5059657096862793, max_abs=9.5, mean_rel=0.3927660882472992, max_rel=6624.99951171875, norm_rel=0.022865118458867073, ref_abs_avg=66.2819595336914, test_abs_avg=66.28575134277344
production_forward grad[5] vs paper_forward: mean_abs=1.078756332397461, max_abs=4.0, mean_rel=0.11450675874948502, max_rel=11.645564079284668, norm_rel=0.023246759548783302, ref_abs_avg=47.24372863769531, test_abs_avg=47.268226623535156
production_forward grad[6] vs paper_forward: mean_abs=1.4291366338729858, max_abs=9.25, mean_rel=0.1575835943222046, max_rel=2077.6572265625, norm_rel=0.0242433100938797, ref_abs_avg=59.3345947265625, test_abs_avg=59.343299865722656
production_forward grad[7] vs paper_forward: mean_abs=1.311851978302002, max_abs=8.0, mean_rel=0.37080109119415283, max_rel=4125.0, norm_rel=0.022445550188422203, ref_abs_avg=58.82807159423828, test_abs_avg=58.837493896484375
production_forward grad[8] vs paper_forward: mean_abs=1.0446711778640747, max_abs=3.75, mean_rel=0.16295680403709412, max_rel=13.94681167602539, norm_rel=0.023762017488479614, ref_abs_avg=43.19041442871094, test_abs_avg=43.270790100097656
production_forward grad[9] vs paper_forward: mean_abs=1.2884612083435059, max_abs=11.0, mean_rel=0.16635318100452423, max_rel=2002.807861328125, norm_rel=0.023964600637555122, ref_abs_avg=54.085750579833984, test_abs_avg=54.090171813964844
production_forward grad[10] vs paper_forward: mean_abs=1.1773934364318848, max_abs=6.75, mean_rel=0.34480851888656616, max_rel=4031.249755859375, norm_rel=0.022201478481292725, ref_abs_avg=53.34233093261719, test_abs_avg=53.350608825683594
production_forward grad[11] vs paper_forward: mean_abs=0.8985996246337891, max_abs=4.625, mean_rel=0.07911416888237, max_rel=5.129215717315674, norm_rel=0.02146511897444725, ref_abs_avg=41.702728271484375, test_abs_avg=41.7352180480957
production_forward grad[12] vs paper_forward: mean_abs=1.1851798295974731, max_abs=8.0, mean_rel=0.1514866203069687, max_rel=912.7742919921875, norm_rel=0.023789238184690475, ref_abs_avg=50.153404235839844, test_abs_avg=50.15453338623047
production_forward grad[13] vs paper_forward: mean_abs=1.0866889953613281, max_abs=6.25, mean_rel=0.37001290917396545, max_rel=3281.249755859375, norm_rel=0.022082902491092682, ref_abs_avg=49.49165725708008, test_abs_avg=49.49570846557617
production_forward grad[14] vs paper_forward: mean_abs=0.8376445770263672, max_abs=3.1875, mean_rel=0.08660368621349335, max_rel=5.926448822021484, norm_rel=0.02173084206879139, ref_abs_avg=38.715599060058594, test_abs_avg=38.69419479370117
production_forward grad[15] vs paper_forward: mean_abs=1.1120975017547607, max_abs=7.5, mean_rel=0.1664825975894928, max_rel=1815.382568359375, norm_rel=0.023563271388411522, ref_abs_avg=47.486175537109375, test_abs_avg=47.488304138183594
production_forward grad[16] vs paper_forward: mean_abs=1.0219444036483765, max_abs=6.25, mean_rel=0.37130218744277954, max_rel=3249.999755859375, norm_rel=0.021933751180768013, ref_abs_avg=46.840919494628906, test_abs_avg=46.83921432495117
production_forward grad[17] vs paper_forward: mean_abs=0.8374319076538086, max_abs=3.25, mean_rel=0.14589127898216248, max_rel=12.770248413085938, norm_rel=0.023135628551244736, ref_abs_avg=36.10573959350586, test_abs_avg=36.1253662109375
production_forward grad[18] vs paper_forward: mean_abs=1.0474050045013428, max_abs=7.0, mean_rel=0.15620866417884827, max_rel=2198.346435546875, norm_rel=0.023465843871235847, ref_abs_avg=44.93026351928711, test_abs_avg=44.93057632446289
production_forward grad[19] vs paper_forward: mean_abs=0.9562181234359741, max_abs=5.75, mean_rel=0.3114278316497803, max_rel=2999.999755859375, norm_rel=0.02180556207895279, ref_abs_avg=44.11818313598633, test_abs_avg=44.12123107910156
production_forward grad[20] vs paper_forward: mean_abs=0.7504551410675049, max_abs=2.5, mean_rel=0.08024363219738007, max_rel=7.2780866622924805, norm_rel=0.02143731527030468, ref_abs_avg=35.08606719970703, test_abs_avg=35.06700134277344
production_forward grad[21] vs paper_forward: mean_abs=0.9881983995437622, max_abs=7.25, mean_rel=0.15158414840698242, max_rel=1360.4080810546875, norm_rel=0.023435518145561218, ref_abs_avg=42.40650177001953, test_abs_avg=42.41136169433594
production_forward grad[22] vs paper_forward: mean_abs=0.9006907939910889, max_abs=5.84375, mean_rel=0.26940709352493286, max_rel=3218.749755859375, norm_rel=0.021918637678027153, ref_abs_avg=41.304840087890625, test_abs_avg=41.30406951904297
production_forward grad[23] vs paper_forward: mean_abs=0.6862545013427734, max_abs=2.53125, mean_rel=0.07410527765750885, max_rel=6.629757881164551, norm_rel=0.02092505805194378, ref_abs_avg=33.27883529663086, test_abs_avg=33.244483947753906
production_forward grad[24] vs paper_forward: mean_abs=0.9420225024223328, max_abs=7.0, mean_rel=0.15024858713150024, max_rel=1564.4876708984375, norm_rel=0.02326606586575508, ref_abs_avg=40.760215759277344, test_abs_avg=40.76215362548828
production_forward grad[25] vs paper_forward: mean_abs=0.8653088808059692, max_abs=5.09375, mean_rel=0.30684587359428406, max_rel=2500.0, norm_rel=0.021761151030659676, ref_abs_avg=39.939151763916016, test_abs_avg=39.94526672363281
production_forward grad[26] vs paper_forward: mean_abs=0.8560327291488647, max_abs=3.46875, mean_rel=0.08107890188694, max_rel=3.714089870452881, norm_rel=0.02375800907611847, ref_abs_avg=36.96164321899414, test_abs_avg=36.929447174072266
production_forward grad[27] vs paper_forward: mean_abs=1.1045684814453125, max_abs=8.0, mean_rel=0.17695939540863037, max_rel=2678.521240234375, norm_rel=0.025094211101531982, ref_abs_avg=44.319976806640625, test_abs_avg=44.32276916503906
production_forward grad[28] vs paper_forward: mean_abs=1.0201131105422974, max_abs=7.0, mean_rel=0.34505540132522583, max_rel=3187.499755859375, norm_rel=0.02339993044734001, ref_abs_avg=43.80104446411133, test_abs_avg=43.807701110839844
production_forward grad[29] vs paper_forward: mean_abs=0.8280737400054932, max_abs=3.0, mean_rel=0.16474369168281555, max_rel=20.96889305114746, norm_rel=0.026967329904437065, ref_abs_avg=30.712608337402344, test_abs_avg=30.687854766845703
production_forward grad[30] vs paper_forward: mean_abs=1.0255470275878906, max_abs=7.5, mean_rel=0.16216567158699036, max_rel=1291.0137939453125, norm_rel=0.025425592437386513, ref_abs_avg=40.515968322753906, test_abs_avg=40.52092742919922
production_forward grad[31] vs paper_forward: mean_abs=0.9610975980758667, max_abs=6.65625, mean_rel=0.3761042654514313, max_rel=4156.25, norm_rel=0.02390921674668789, ref_abs_avg=40.3096923828125, test_abs_avg=40.317344665527344
production_forward grad[32] vs paper_forward: mean_abs=0.727543830871582, max_abs=3.0, mean_rel=0.09235252439975739, max_rel=5.418854713439941, norm_rel=0.024433063343167305, ref_abs_avg=29.953933715820312, test_abs_avg=29.899127960205078
production_forward grad[33] vs paper_forward: mean_abs=0.9455820322036743, max_abs=6.0, mean_rel=0.16270336508750916, max_rel=921.736083984375, norm_rel=0.025213666260242462, ref_abs_avg=37.66570281982422, test_abs_avg=37.668601989746094
production_forward grad[34] vs paper_forward: mean_abs=0.8843206167221069, max_abs=5.25, mean_rel=0.3287442922592163, max_rel=3499.999755859375, norm_rel=0.02371043525636196, ref_abs_avg=37.35264587402344, test_abs_avg=37.360198974609375
production_forward grad[35] vs paper_forward: mean_abs=0.6696498394012451, max_abs=2.75, mean_rel=0.0813673585653305, max_rel=4.923919200897217, norm_rel=0.023134900256991386, ref_abs_avg=29.43256950378418, test_abs_avg=29.420875549316406
production_forward grad[36] vs paper_forward: mean_abs=0.8947149515151978, max_abs=6.0, mean_rel=0.16922146081924438, max_rel=1642.531494140625, norm_rel=0.024983569979667664, ref_abs_avg=35.95185089111328, test_abs_avg=35.949337005615234
production_forward grad[37] vs paper_forward: mean_abs=0.830243706703186, max_abs=5.25, mean_rel=0.2766008675098419, max_rel=2437.5, norm_rel=0.023580366745591164, ref_abs_avg=35.345184326171875, test_abs_avg=35.343482971191406
production_forward grad[38] vs paper_forward: mean_abs=0.6453170776367188, max_abs=2.625, mean_rel=0.09498566389083862, max_rel=5.693464279174805, norm_rel=0.022795790806412697, ref_abs_avg=28.12024688720703, test_abs_avg=28.173126220703125
production_forward grad[39] vs paper_forward: mean_abs=0.8376234769821167, max_abs=7.0, mean_rel=0.1655673235654831, max_rel=1326.944580078125, norm_rel=0.024832382798194885, ref_abs_avg=33.833045959472656, test_abs_avg=33.83302307128906
production_forward grad[40] vs paper_forward: mean_abs=0.7814157009124756, max_abs=4.5, mean_rel=0.2633667588233948, max_rel=2093.75, norm_rel=0.02352430485188961, ref_abs_avg=33.315223693847656, test_abs_avg=33.31623840332031
production_forward grad[41] vs paper_forward: mean_abs=0.6175985336303711, max_abs=2.125, mean_rel=0.20230931043624878, max_rel=55.76170349121094, norm_rel=0.022559236735105515, ref_abs_avg=27.063831329345703, test_abs_avg=27.107812881469727
production_forward grad[42] vs paper_forward: mean_abs=0.8003364205360413, max_abs=6.5, mean_rel=0.1571660339832306, max_rel=1068.02880859375, norm_rel=0.024635324254631996, ref_abs_avg=32.57779312133789, test_abs_avg=32.57615280151367
production_forward grad[43] vs paper_forward: mean_abs=0.7403382658958435, max_abs=4.5, mean_rel=0.30976760387420654, max_rel=2765.624755859375, norm_rel=0.022947203367948532, ref_abs_avg=32.28892517089844, test_abs_avg=32.28030776977539
production_forward grad[44] vs paper_forward: mean_abs=0.5814127922058105, max_abs=2.25, mean_rel=0.10288812220096588, max_rel=8.952533721923828, norm_rel=0.022517090663313866, ref_abs_avg=26.719587326049805, test_abs_avg=26.770750045776367
production_forward grad[45] vs paper_forward: mean_abs=0.7560529708862305, max_abs=5.0, mean_rel=0.1513306349515915, max_rel=684.4465942382812, norm_rel=0.02422325126826763, ref_abs_avg=31.260786056518555, test_abs_avg=31.259994506835938
production_forward grad[46] vs paper_forward: mean_abs=0.7043888568878174, max_abs=5.25, mean_rel=0.27185317873954773, max_rel=1937.4998779296875, norm_rel=0.02304450049996376, ref_abs_avg=30.66269874572754, test_abs_avg=30.66488265991211
production_forward grad[47] vs paper_forward: mean_abs=0.5533537864685059, max_abs=2.03125, mean_rel=0.07784765958786011, max_rel=4.218051433563232, norm_rel=0.02203396148979664, ref_abs_avg=24.981801986694336, test_abs_avg=24.948165893554688
production_forward grad[48] vs paper_forward: mean_abs=0.7258870601654053, max_abs=6.0, mean_rel=0.15661367774009705, max_rel=1399.5848388671875, norm_rel=0.024077527225017548, ref_abs_avg=30.215709686279297, test_abs_avg=30.217084884643555
production_forward grad[49] vs paper_forward: mean_abs=0.672822117805481, max_abs=4.75, mean_rel=0.25545457005500793, max_rel=2406.25, norm_rel=0.022801073268055916, ref_abs_avg=29.586139678955078, test_abs_avg=29.582496643066406
production_forward grad[50] vs paper_forward: mean_abs=0.6031222343444824, max_abs=2.48828125, mean_rel=0.38381752371788025, max_rel=143.87554931640625, norm_rel=0.024697665125131607, ref_abs_avg=24.865428924560547, test_abs_avg=24.854049682617188
production_forward grad[51] vs paper_forward: mean_abs=0.8065313100814819, max_abs=7.0, mean_rel=0.17791393399238586, max_rel=1243.5694580078125, norm_rel=0.026015833020210266, ref_abs_avg=31.09296226501465, test_abs_avg=31.093284606933594
production_forward grad[52] vs paper_forward: mean_abs=0.7486549019813538, max_abs=4.5, mean_rel=0.2937788963317871, max_rel=2781.249755859375, norm_rel=0.02427602931857109, ref_abs_avg=30.901851654052734, test_abs_avg=30.894847869873047
production_forward grad[53] vs paper_forward: mean_abs=0.5811636447906494, max_abs=2.75, mean_rel=0.1305004358291626, max_rel=11.940048217773438, norm_rel=0.026490533724427223, ref_abs_avg=22.325645446777344, test_abs_avg=22.316669464111328
production_forward grad[54] vs paper_forward: mean_abs=0.7304344177246094, max_abs=5.0, mean_rel=0.16625052690505981, max_rel=1378.4970703125, norm_rel=0.02560984157025814, ref_abs_avg=28.594219207763672, test_abs_avg=28.595165252685547
production_forward grad[55] vs paper_forward: mean_abs=0.6811214685440063, max_abs=5.1875, mean_rel=0.2719574272632599, max_rel=2250.0, norm_rel=0.023883333429694176, ref_abs_avg=28.61441993713379, test_abs_avg=28.612323760986328
production_forward grad[56] vs paper_forward: mean_abs=0.5508749485015869, max_abs=2.0, mean_rel=0.13377299904823303, max_rel=16.809553146362305, norm_rel=0.024485090747475624, ref_abs_avg=22.880050659179688, test_abs_avg=22.809818267822266
production_forward grad[57] vs paper_forward: mean_abs=0.6768503785133362, max_abs=6.0, mean_rel=0.16609087586402893, max_rel=1419.639404296875, norm_rel=0.02501632086932659, ref_abs_avg=27.121376037597656, test_abs_avg=27.11888885498047
production_forward grad[58] vs paper_forward: mean_abs=0.6294729113578796, max_abs=4.390625, mean_rel=0.2513095438480377, max_rel=2187.5, norm_rel=0.02363649010658264, ref_abs_avg=26.707744598388672, test_abs_avg=26.70911407470703
production_forward grad[59] vs paper_forward: mean_abs=0.5275723934173584, max_abs=1.875, mean_rel=0.25033336877822876, max_rel=76.2752914428711, norm_rel=0.025156458839774132, ref_abs_avg=21.591808319091797, test_abs_avg=21.59579849243164
production_forward grad[60] vs paper_forward: mean_abs=0.6423983573913574, max_abs=5.0, mean_rel=0.16062989830970764, max_rel=1155.742431640625, norm_rel=0.024485213682055473, ref_abs_avg=26.261505126953125, test_abs_avg=26.26026153564453
production_forward grad[61] vs paper_forward: mean_abs=0.5958378314971924, max_abs=4.375, mean_rel=0.2564051151275635, max_rel=1999.9998779296875, norm_rel=0.02325410582125187, ref_abs_avg=25.69861602783203, test_abs_avg=25.69257164001465
production_forward grad[62] vs paper_forward: mean_abs=0.4825076460838318, max_abs=2.0, mean_rel=0.24627861380577087, max_rel=60.78620147705078, norm_rel=0.024070478975772858, ref_abs_avg=19.661205291748047, test_abs_avg=19.692310333251953
production_forward grad[63] vs paper_forward: mean_abs=0.616299033164978, max_abs=5.0, mean_rel=0.15852808952331543, max_rel=1235.43603515625, norm_rel=0.02420930564403534, ref_abs_avg=25.436874389648438, test_abs_avg=25.43368148803711
production_forward grad[64] vs paper_forward: mean_abs=0.5647894740104675, max_abs=4.0, mean_rel=0.2505607306957245, max_rel=2062.5, norm_rel=0.022594233974814415, ref_abs_avg=25.004112243652344, test_abs_avg=25.010208129882812
production_forward grad[65] vs paper_forward: mean_abs=0.4663780927658081, max_abs=1.78125, mean_rel=0.07190832495689392, max_rel=2.404695510864258, norm_rel=0.02286640554666519, ref_abs_avg=19.962997436523438, test_abs_avg=19.93768882751465
production_forward grad[66] vs paper_forward: mean_abs=0.5769692659378052, max_abs=4.5, mean_rel=0.1504892259836197, max_rel=944.3842163085938, norm_rel=0.024127911776304245, ref_abs_avg=23.94095802307129, test_abs_avg=23.94094467163086
production_forward grad[67] vs paper_forward: mean_abs=0.537763237953186, max_abs=3.9609375, mean_rel=0.2445511817932129, max_rel=2343.75, norm_rel=0.02252148650586605, ref_abs_avg=23.858699798583984, test_abs_avg=23.868202209472656
production_forward grad[68] vs paper_forward: mean_abs=0.4235954284667969, max_abs=2.25, mean_rel=0.11633986979722977, max_rel=9.852572441101074, norm_rel=0.02172701247036457, ref_abs_avg=19.804508209228516, test_abs_avg=19.827680587768555
production_forward grad[69] vs paper_forward: mean_abs=0.5540332794189453, max_abs=5.0, mean_rel=0.1493026167154312, max_rel=1004.4124145507812, norm_rel=0.023587137460708618, ref_abs_avg=23.480792999267578, test_abs_avg=23.48080825805664
production_forward grad[70] vs paper_forward: mean_abs=0.5077829360961914, max_abs=3.5, mean_rel=0.23198586702346802, max_rel=1687.4998779296875, norm_rel=0.02214641310274601, ref_abs_avg=22.937740325927734, test_abs_avg=22.929990768432617
production_forward grad[71] vs paper_forward: mean_abs=0.4130287170410156, max_abs=1.5, mean_rel=0.08166145533323288, max_rel=4.0108137130737305, norm_rel=0.021422214806079865, ref_abs_avg=19.191770553588867, test_abs_avg=19.19244956970215
production_forward grad[72] vs paper_forward: mean_abs=0.5276318192481995, max_abs=4.0, mean_rel=0.14682459831237793, max_rel=627.5460205078125, norm_rel=0.02344352751970291, ref_abs_avg=22.533653259277344, test_abs_avg=22.53409767150879
production_forward grad[73] vs paper_forward: mean_abs=0.4885445237159729, max_abs=3.5, mean_rel=0.19970865547657013, max_rel=1937.4998779296875, norm_rel=0.02169468253850937, ref_abs_avg=22.416643142700195, test_abs_avg=22.415807723999023
production_forward grad[74] vs paper_forward: mean_abs=0.45046544075012207, max_abs=1.8984375, mean_rel=0.10900837928056717, max_rel=6.571330547332764, norm_rel=0.02341669611632824, ref_abs_avg=19.574636459350586, test_abs_avg=19.59421157836914
production_forward grad[75] vs paper_forward: mean_abs=0.566820502281189, max_abs=4.5, mean_rel=0.15943077206611633, max_rel=1102.84521484375, norm_rel=0.024723615497350693, ref_abs_avg=22.94317054748535, test_abs_avg=22.942108154296875
production_forward grad[76] vs paper_forward: mean_abs=0.5267156362533569, max_abs=4.0, mean_rel=0.19967171549797058, max_rel=1187.5, norm_rel=0.023156261071562767, ref_abs_avg=22.753219604492188, test_abs_avg=22.746570587158203
production_forward grad[77] vs paper_forward: mean_abs=0.4058809280395508, max_abs=1.5, mean_rel=0.0755014568567276, max_rel=3.915604829788208, norm_rel=0.023478388786315918, ref_abs_avg=17.500629425048828, test_abs_avg=17.523151397705078
production_forward grad[78] vs paper_forward: mean_abs=0.5273388624191284, max_abs=5.5, mean_rel=0.14597705006599426, max_rel=563.022216796875, norm_rel=0.024184223264455795, ref_abs_avg=21.844898223876953, test_abs_avg=21.846359252929688
production_forward grad[79] vs paper_forward: mean_abs=0.4879382848739624, max_abs=4.0, mean_rel=0.19845205545425415, max_rel=1687.4998779296875, norm_rel=0.02235637791454792, ref_abs_avg=21.78752899169922, test_abs_avg=21.785842895507812
production_forward grad[80] vs paper_forward: mean_abs=0.3930702209472656, max_abs=1.75, mean_rel=0.09016318619251251, max_rel=6.175782680511475, norm_rel=0.021997764706611633, ref_abs_avg=18.13764762878418, test_abs_avg=18.141326904296875
production_forward grad[81] vs paper_forward: mean_abs=0.498077392578125, max_abs=4.0, mean_rel=0.1490194946527481, max_rel=895.39599609375, norm_rel=0.02377537451684475, ref_abs_avg=21.003768920898438, test_abs_avg=21.00454330444336
production_forward grad[82] vs paper_forward: mean_abs=0.45533454418182373, max_abs=4.0, mean_rel=0.20442670583724976, max_rel=1562.4998779296875, norm_rel=0.021871354430913925, ref_abs_avg=20.84107208251953, test_abs_avg=20.846759796142578
production_forward grad[83] vs paper_forward: mean_abs=0.363747239112854, max_abs=1.5, mean_rel=0.07057879120111465, max_rel=3.259699821472168, norm_rel=0.02081008441746235, ref_abs_avg=17.781660079956055, test_abs_avg=17.777080535888672
production_forward grad[84] vs paper_forward: mean_abs=0.47292453050613403, max_abs=5.25, mean_rel=0.1514979898929596, max_rel=834.5006713867188, norm_rel=0.023059675469994545, ref_abs_avg=20.543861389160156, test_abs_avg=20.54256820678711
production_forward grad[85] vs paper_forward: mean_abs=0.42671847343444824, max_abs=5.1875, mean_rel=0.20546123385429382, max_rel=2140.625, norm_rel=0.021415051072835922, ref_abs_avg=20.009241104125977, test_abs_avg=20.0081787109375
production_forward grad[86] vs paper_forward: mean_abs=0.36152589321136475, max_abs=1.625, mean_rel=0.25555458664894104, max_rel=55.377784729003906, norm_rel=0.022831236943602562, ref_abs_avg=15.832069396972656, test_abs_avg=15.84034538269043
production_forward grad[87] vs paper_forward: mean_abs=0.4418744444847107, max_abs=4.5, mean_rel=0.139777272939682, max_rel=927.7066650390625, norm_rel=0.022753475233912468, ref_abs_avg=19.4986572265625, test_abs_avg=19.49911880493164
production_forward grad[88] vs paper_forward: mean_abs=0.3966979384422302, max_abs=4.0, mean_rel=0.1839245706796646, max_rel=1531.2498779296875, norm_rel=0.020392367616295815, ref_abs_avg=19.41327476501465, test_abs_avg=19.42256736755371
production_forward grad[89] vs paper_forward: mean_abs=0.3174886703491211, max_abs=1.375, mean_rel=0.12244488298892975, max_rel=19.09549903869629, norm_rel=0.019993338733911514, ref_abs_avg=16.588550567626953, test_abs_avg=16.595319747924805
production_forward grad[90] vs paper_forward: mean_abs=0.4129825830459595, max_abs=4.0, mean_rel=0.13357841968536377, max_rel=674.7901611328125, norm_rel=0.022366316989064217, ref_abs_avg=18.581439971923828, test_abs_avg=18.5816593170166
production_forward grad[91] vs paper_forward: mean_abs=0.38649195432662964, max_abs=3.5, mean_rel=0.20815891027450562, max_rel=1937.4998779296875, norm_rel=0.021312372758984566, ref_abs_avg=18.304176330566406, test_abs_avg=18.299877166748047
production_forward grad[92] vs paper_forward: mean_abs=0.32126712799072266, max_abs=1.25, mean_rel=0.06913649290800095, max_rel=5.020767688751221, norm_rel=0.020588209852576256, ref_abs_avg=15.375414848327637, test_abs_avg=15.38524055480957
production_forward grad[93] vs paper_forward: mean_abs=0.38621026277542114, max_abs=4.5, mean_rel=0.13163869082927704, max_rel=1197.6424560546875, norm_rel=0.021748535335063934, ref_abs_avg=17.949642181396484, test_abs_avg=17.950010299682617
production_forward grad[94] vs paper_forward: mean_abs=0.3538472652435303, max_abs=3.5, mean_rel=0.17387117445468903, max_rel=1593.7498779296875, norm_rel=0.020577700808644295, ref_abs_avg=17.433692932128906, test_abs_avg=17.435222625732422
production_forward grad[95] vs paper_forward: mean_abs=0.29250383377075195, max_abs=1.0703125, mean_rel=0.06363518536090851, max_rel=4.002863883972168, norm_rel=0.019430525600910187, ref_abs_avg=15.280427932739258, test_abs_avg=15.259977340698242
production_forward grad[96] vs paper_forward: mean_abs=0.36681807041168213, max_abs=3.75, mean_rel=0.12579841911792755, max_rel=480.15863037109375, norm_rel=0.021260114386677742, ref_abs_avg=17.541900634765625, test_abs_avg=17.542400360107422
production_forward grad[97] vs paper_forward: mean_abs=0.3211383819580078, max_abs=3.5, mean_rel=0.1994909644126892, max_rel=1468.7498779296875, norm_rel=0.018813351169228554, ref_abs_avg=17.173433303833008, test_abs_avg=17.18271255493164
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016514456365257502, max_abs=0.03515625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008776606060564518, max_abs=0.328125, mean_rel=0.07467524707317352, max_rel=134.2149200439453, norm_rel=0.020378857851028442, ref_abs_avg=0.4651203751564026, test_abs_avg=0.46513181924819946
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.481072902679443, max_abs=52.0, mean_rel=0.14326220750808716, max_rel=173.9342041015625, norm_rel=0.020760323852300644, ref_abs_avg=320.385498046875, test_abs_avg=320.45526123046875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3420381546020508, max_abs=5.5, mean_rel=0.14076024293899536, max_rel=21.44224739074707, norm_rel=0.02556181699037552, ref_abs_avg=51.683963775634766, test_abs_avg=51.639305114746094
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6809965372085571, max_abs=12.0, mean_rel=0.17915430665016174, max_rel=3227.985107421875, norm_rel=0.025161124765872955, ref_abs_avg=67.28092956542969, test_abs_avg=67.28839874267578
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.554661750793457, max_abs=9.625, mean_rel=0.45151787996292114, max_rel=5999.99951171875, norm_rel=0.023590946570038795, ref_abs_avg=66.2819595336914, test_abs_avg=66.28787231445312
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1456737518310547, max_abs=4.0, mean_rel=0.10681824386119843, max_rel=10.06287956237793, norm_rel=0.024310892447829247, ref_abs_avg=47.24372863769531, test_abs_avg=47.279483795166016
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4694104194641113, max_abs=10.0, mean_rel=0.17240622639656067, max_rel=3970.647216796875, norm_rel=0.02492411434650421, ref_abs_avg=59.3345947265625, test_abs_avg=59.34080123901367
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.353178858757019, max_abs=8.5, mean_rel=0.4137333333492279, max_rel=5562.49951171875, norm_rel=0.023168783634901047, ref_abs_avg=58.82807159423828, test_abs_avg=58.84266662597656
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0366101264953613, max_abs=5.0, mean_rel=0.15294292569160461, max_rel=14.942733764648438, norm_rel=0.0238969624042511, ref_abs_avg=43.19041442871094, test_abs_avg=43.227630615234375
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3235390186309814, max_abs=10.0, mean_rel=0.16437946259975433, max_rel=1870.3543701171875, norm_rel=0.024606386199593544, ref_abs_avg=54.085750579833984, test_abs_avg=54.090476989746094
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2148641347885132, max_abs=8.0, mean_rel=0.3461759686470032, max_rel=3874.999755859375, norm_rel=0.022895274683833122, ref_abs_avg=53.34233093261719, test_abs_avg=53.35090637207031
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9488630294799805, max_abs=3.5, mean_rel=0.09537311643362045, max_rel=5.267368316650391, norm_rel=0.022254865616559982, ref_abs_avg=41.702728271484375, test_abs_avg=41.75086212158203
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.217087745666504, max_abs=9.0, mean_rel=0.15537315607070923, max_rel=1316.7362060546875, norm_rel=0.02444203943014145, ref_abs_avg=50.153404235839844, test_abs_avg=50.1529541015625
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1223620176315308, max_abs=6.875, mean_rel=0.38364437222480774, max_rel=3249.999755859375, norm_rel=0.02279708907008171, ref_abs_avg=49.49165725708008, test_abs_avg=49.4979362487793
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8694303035736084, max_abs=3.0, mean_rel=0.07464398443698883, max_rel=2.855541467666626, norm_rel=0.02232680842280388, ref_abs_avg=38.715599060058594, test_abs_avg=38.6744384765625
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1385953426361084, max_abs=8.0, mean_rel=0.17251229286193848, max_rel=1571.6776123046875, norm_rel=0.024120304733514786, ref_abs_avg=47.486175537109375, test_abs_avg=47.49090576171875
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0527877807617188, max_abs=6.5, mean_rel=0.3411274552345276, max_rel=3093.749755859375, norm_rel=0.022589227184653282, ref_abs_avg=46.840919494628906, test_abs_avg=46.83802795410156
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8445591926574707, max_abs=3.5, mean_rel=0.17022547125816345, max_rel=18.89071273803711, norm_rel=0.023191535845398903, ref_abs_avg=36.10573959350586, test_abs_avg=36.10765838623047
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0719408988952637, max_abs=8.0, mean_rel=0.16248416900634766, max_rel=3544.4453125, norm_rel=0.024030210450291634, ref_abs_avg=44.93026351928711, test_abs_avg=44.93064498901367
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.980155348777771, max_abs=6.375, mean_rel=0.33090439438819885, max_rel=3624.999755859375, norm_rel=0.022329844534397125, ref_abs_avg=44.11818313598633, test_abs_avg=44.12165069580078
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7600641250610352, max_abs=2.625, mean_rel=0.09157738089561462, max_rel=10.96582317352295, norm_rel=0.02173013985157013, ref_abs_avg=35.08606719970703, test_abs_avg=35.05591583251953
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.008514165878296, max_abs=7.0, mean_rel=0.1514451801776886, max_rel=1385.1466064453125, norm_rel=0.0239250548183918, ref_abs_avg=42.40650177001953, test_abs_avg=42.409324645996094
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9232826232910156, max_abs=6.0, mean_rel=0.25384950637817383, max_rel=2765.624755859375, norm_rel=0.02245151624083519, ref_abs_avg=41.304840087890625, test_abs_avg=41.30431365966797
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7423992156982422, max_abs=3.0, mean_rel=0.08607419580221176, max_rel=6.078099250793457, norm_rel=0.022616354748606682, ref_abs_avg=33.27883529663086, test_abs_avg=33.271575927734375
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9613339900970459, max_abs=8.0, mean_rel=0.1507396697998047, max_rel=934.3761596679688, norm_rel=0.023732895031571388, ref_abs_avg=40.760215759277344, test_abs_avg=40.76310729980469
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8855772614479065, max_abs=5.5, mean_rel=0.3007362484931946, max_rel=3031.249755859375, norm_rel=0.02223169431090355, ref_abs_avg=39.939151763916016, test_abs_avg=39.94538879394531
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8936192989349365, max_abs=3.3125, mean_rel=0.07937663793563843, max_rel=2.4482226371765137, norm_rel=0.024117784574627876, ref_abs_avg=36.96164321899414, test_abs_avg=36.900726318359375
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1288552284240723, max_abs=7.0, mean_rel=0.1857444941997528, max_rel=2195.85498046875, norm_rel=0.02560983970761299, ref_abs_avg=44.319976806640625, test_abs_avg=44.3217658996582
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0437276363372803, max_abs=7.0, mean_rel=0.33971112966537476, max_rel=3437.499755859375, norm_rel=0.023942047730088234, ref_abs_avg=43.80104446411133, test_abs_avg=43.809356689453125
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8051919937133789, max_abs=3.25, mean_rel=0.1083274707198143, max_rel=9.031271934509277, norm_rel=0.026545293629169464, ref_abs_avg=30.712608337402344, test_abs_avg=30.717281341552734
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.044221043586731, max_abs=8.0, mean_rel=0.16930577158927917, max_rel=1198.164306640625, norm_rel=0.02588712051510811, ref_abs_avg=40.515968322753906, test_abs_avg=40.519920349121094
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9807718992233276, max_abs=6.65625, mean_rel=0.3350585699081421, max_rel=4218.75, norm_rel=0.0244212094694376, ref_abs_avg=40.3096923828125, test_abs_avg=40.31743240356445
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7697410583496094, max_abs=3.0, mean_rel=0.1047598347067833, max_rel=6.365245819091797, norm_rel=0.02623044140636921, ref_abs_avg=29.953933715820312, test_abs_avg=29.89706039428711
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9646303653717041, max_abs=8.0, mean_rel=0.17024797201156616, max_rel=1253.2413330078125, norm_rel=0.025727886706590652, ref_abs_avg=37.66570281982422, test_abs_avg=37.66923904418945
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8987489938735962, max_abs=5.5, mean_rel=0.33399903774261475, max_rel=3874.999755859375, norm_rel=0.024111084640026093, ref_abs_avg=37.35264587402344, test_abs_avg=37.3587532043457
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7055989503860474, max_abs=2.28125, mean_rel=0.08882751315832138, max_rel=4.052900791168213, norm_rel=0.024092497304081917, ref_abs_avg=29.43256950378418, test_abs_avg=29.37546157836914
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.9104050397872925, max_abs=6.5, mean_rel=0.17384083569049835, max_rel=1657.400390625, norm_rel=0.025420118123292923, ref_abs_avg=35.95185089111328, test_abs_avg=35.949493408203125
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.844855546951294, max_abs=5.25, mean_rel=0.27793076634407043, max_rel=2812.499755859375, norm_rel=0.023983173072338104, ref_abs_avg=35.345184326171875, test_abs_avg=35.3428955078125
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.658482551574707, max_abs=2.5, mean_rel=0.09556768834590912, max_rel=5.664284706115723, norm_rel=0.023442519828677177, ref_abs_avg=28.12024688720703, test_abs_avg=28.163589477539062
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8514772653579712, max_abs=5.5, mean_rel=0.16722393035888672, max_rel=1027.9033203125, norm_rel=0.02522016502916813, ref_abs_avg=33.833045959472656, test_abs_avg=33.8328857421875
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7942591905593872, max_abs=5.0, mean_rel=0.2576030194759369, max_rel=2125.0, norm_rel=0.02391677163541317, ref_abs_avg=33.315223693847656, test_abs_avg=33.31535339355469
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6229583024978638, max_abs=2.25, mean_rel=0.17408481240272522, max_rel=39.28804397583008, norm_rel=0.023127375170588493, ref_abs_avg=27.063831329345703, test_abs_avg=27.086977005004883
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8113088011741638, max_abs=6.0, mean_rel=0.1584966480731964, max_rel=806.2532348632812, norm_rel=0.024972185492515564, ref_abs_avg=32.57779312133789, test_abs_avg=32.57571029663086
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7549055814743042, max_abs=4.5, mean_rel=0.322979599237442, max_rel=2812.499755859375, norm_rel=0.023380698636174202, ref_abs_avg=32.28892517089844, test_abs_avg=32.28232192993164
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5819200277328491, max_abs=2.125, mean_rel=0.08846460282802582, max_rel=5.374151229858398, norm_rel=0.022501835599541664, ref_abs_avg=26.719587326049805, test_abs_avg=26.75763511657715
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7649879455566406, max_abs=5.5, mean_rel=0.1576157808303833, max_rel=782.7977905273438, norm_rel=0.024487201124429703, ref_abs_avg=31.260786056518555, test_abs_avg=31.261627197265625
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7139270305633545, max_abs=4.75, mean_rel=0.26082876324653625, max_rel=2062.5, norm_rel=0.023360053077340126, ref_abs_avg=30.66269874572754, test_abs_avg=30.6641845703125
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5529496669769287, max_abs=2.25, mean_rel=0.09920332580804825, max_rel=15.737227439880371, norm_rel=0.02243654429912567, ref_abs_avg=24.981801986694336, test_abs_avg=24.926979064941406
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7345855236053467, max_abs=5.0, mean_rel=0.1583588421344757, max_rel=1565.505859375, norm_rel=0.02436264418065548, ref_abs_avg=30.215709686279297, test_abs_avg=30.217927932739258
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6785907745361328, max_abs=4.5, mean_rel=0.2760230004787445, max_rel=2562.5, norm_rel=0.02299467660486698, ref_abs_avg=29.586139678955078, test_abs_avg=29.585542678833008
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6270864009857178, max_abs=2.41015625, mean_rel=0.36415618658065796, max_rel=124.87630462646484, norm_rel=0.025434471666812897, ref_abs_avg=24.865428924560547, test_abs_avg=24.853954315185547
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.820358395576477, max_abs=6.0, mean_rel=0.18007753789424896, max_rel=1425.450927734375, norm_rel=0.02646750584244728, ref_abs_avg=31.09296226501465, test_abs_avg=31.093608856201172
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7596976161003113, max_abs=5.3125, mean_rel=0.28353601694107056, max_rel=2500.0, norm_rel=0.024643849581480026, ref_abs_avg=30.901851654052734, test_abs_avg=30.895008087158203
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5735199451446533, max_abs=2.75, mean_rel=0.13062714040279388, max_rel=15.44538402557373, norm_rel=0.026024818420410156, ref_abs_avg=22.325645446777344, test_abs_avg=22.3559513092041
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7406054139137268, max_abs=6.0, mean_rel=0.16423876583576202, max_rel=1511.130615234375, norm_rel=0.02597271278500557, ref_abs_avg=28.594219207763672, test_abs_avg=28.593761444091797
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6903992891311646, max_abs=6.375, mean_rel=0.28536367416381836, max_rel=1937.4998779296875, norm_rel=0.02423042058944702, ref_abs_avg=28.61441993713379, test_abs_avg=28.615612030029297
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5820584297180176, max_abs=2.125, mean_rel=0.10597987473011017, max_rel=9.26797103881836, norm_rel=0.025682948529720306, ref_abs_avg=22.880050659179688, test_abs_avg=22.82257080078125
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6854405403137207, max_abs=5.0, mean_rel=0.17113551497459412, max_rel=1207.1068115234375, norm_rel=0.02534334547817707, ref_abs_avg=27.121376037597656, test_abs_avg=27.117359161376953
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6390461921691895, max_abs=4.125, mean_rel=0.26667577028274536, max_rel=1999.9998779296875, norm_rel=0.024000097066164017, ref_abs_avg=26.707744598388672, test_abs_avg=26.710132598876953
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5354955196380615, max_abs=2.0625, mean_rel=0.2951195240020752, max_rel=105.73493957519531, norm_rel=0.025159800425171852, ref_abs_avg=21.591808319091797, test_abs_avg=21.597976684570312
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6489790081977844, max_abs=5.0, mean_rel=0.16608406603336334, max_rel=1406.2364501953125, norm_rel=0.024740440770983696, ref_abs_avg=26.261505126953125, test_abs_avg=26.260570526123047
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.601388692855835, max_abs=4.375, mean_rel=0.24648413062095642, max_rel=1812.4998779296875, norm_rel=0.023472243919968605, ref_abs_avg=25.69861602783203, test_abs_avg=25.695281982421875
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4701690077781677, max_abs=2.25, mean_rel=0.23670722544193268, max_rel=58.650508880615234, norm_rel=0.02397291362285614, ref_abs_avg=19.661205291748047, test_abs_avg=19.69705581665039
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6220391392707825, max_abs=5.0, mean_rel=0.1598474085330963, max_rel=745.7316284179688, norm_rel=0.02442999929189682, ref_abs_avg=25.436874389648438, test_abs_avg=25.43529510498047
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5722506642341614, max_abs=4.0, mean_rel=0.26779815554618835, max_rel=2593.749755859375, norm_rel=0.022876273840665817, ref_abs_avg=25.004112243652344, test_abs_avg=25.011219024658203
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.48545026779174805, max_abs=2.03125, mean_rel=0.09588024020195007, max_rel=10.789581298828125, norm_rel=0.023726653307676315, ref_abs_avg=19.962997436523438, test_abs_avg=19.924560546875
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5823907256126404, max_abs=5.0, mean_rel=0.15188255906105042, max_rel=1037.5072021484375, norm_rel=0.024341445416212082, ref_abs_avg=23.94095802307129, test_abs_avg=23.940959930419922
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5453014373779297, max_abs=4.125, mean_rel=0.24121597409248352, max_rel=1843.7498779296875, norm_rel=0.022831400856375694, ref_abs_avg=23.858699798583984, test_abs_avg=23.868202209472656
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4403409957885742, max_abs=2.0, mean_rel=0.1374422013759613, max_rel=13.514163970947266, norm_rel=0.022489046677947044, ref_abs_avg=19.804508209228516, test_abs_avg=19.842187881469727
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5579648017883301, max_abs=5.25, mean_rel=0.14719989895820618, max_rel=1021.584228515625, norm_rel=0.02375044673681259, ref_abs_avg=23.480792999267578, test_abs_avg=23.48072052001953
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.508756160736084, max_abs=4.0, mean_rel=0.22028347849845886, max_rel=1656.2498779296875, norm_rel=0.022159621119499207, ref_abs_avg=22.937740325927734, test_abs_avg=22.928855895996094
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.41225624084472656, max_abs=1.5, mean_rel=0.07920604944229126, max_rel=4.210246562957764, norm_rel=0.021734772250056267, ref_abs_avg=19.191770553588867, test_abs_avg=19.182708740234375
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5315982699394226, max_abs=4.5, mean_rel=0.14690877497196198, max_rel=652.3425903320312, norm_rel=0.023619504645466805, ref_abs_avg=22.533653259277344, test_abs_avg=22.533634185791016
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.49423593282699585, max_abs=3.75, mean_rel=0.2044258713722229, max_rel=1765.6248779296875, norm_rel=0.021970238536596298, ref_abs_avg=22.416643142700195, test_abs_avg=22.42011260986328
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.44718217849731445, max_abs=1.578125, mean_rel=0.09765774011611938, max_rel=6.176656246185303, norm_rel=0.023369017988443375, ref_abs_avg=19.574636459350586, test_abs_avg=19.58858871459961
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.572927713394165, max_abs=5.0, mean_rel=0.15900693833827972, max_rel=835.4698486328125, norm_rel=0.02499198168516159, ref_abs_avg=22.94317054748535, test_abs_avg=22.942005157470703
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5312764644622803, max_abs=5.0, mean_rel=0.21767276525497437, max_rel=1218.75, norm_rel=0.023364873602986336, ref_abs_avg=22.753219604492188, test_abs_avg=22.748031616210938
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4008922576904297, max_abs=1.375, mean_rel=0.06734243780374527, max_rel=3.8916561603546143, norm_rel=0.023524686694145203, ref_abs_avg=17.500629425048828, test_abs_avg=17.5191707611084
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5334063172340393, max_abs=4.5, mean_rel=0.15060552954673767, max_rel=907.0635375976562, norm_rel=0.024445200338959694, ref_abs_avg=21.844898223876953, test_abs_avg=21.846759796142578
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.4918977618217468, max_abs=4.25, mean_rel=0.2036091387271881, max_rel=1648.4373779296875, norm_rel=0.022524315863847733, ref_abs_avg=21.78752899169922, test_abs_avg=21.786529541015625
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3861280679702759, max_abs=1.5, mean_rel=0.0866028219461441, max_rel=3.539393424987793, norm_rel=0.02167259342968464, ref_abs_avg=18.13764762878418, test_abs_avg=18.121990203857422
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5013935565948486, max_abs=5.0, mean_rel=0.15196064114570618, max_rel=918.3092651367188, norm_rel=0.023930568248033524, ref_abs_avg=21.003768920898438, test_abs_avg=21.00442886352539
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.45649024844169617, max_abs=4.0625, mean_rel=0.20403946936130524, max_rel=1281.25, norm_rel=0.02188768982887268, ref_abs_avg=20.84107208251953, test_abs_avg=20.846221923828125
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.36956310272216797, max_abs=1.5, mean_rel=0.07336483895778656, max_rel=3.284162759780884, norm_rel=0.020856251940131187, ref_abs_avg=17.781660079956055, test_abs_avg=17.779014587402344
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.47558605670928955, max_abs=5.0, mean_rel=0.15519066154956818, max_rel=1411.847412109375, norm_rel=0.02319343388080597, ref_abs_avg=20.543861389160156, test_abs_avg=20.543136596679688
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.42923063039779663, max_abs=4.25, mean_rel=0.2118145227432251, max_rel=2015.6248779296875, norm_rel=0.021492095664143562, ref_abs_avg=20.009241104125977, test_abs_avg=20.008935928344727
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3659151792526245, max_abs=1.625, mean_rel=0.2492174655199051, max_rel=41.66666793823242, norm_rel=0.02319568581879139, ref_abs_avg=15.832069396972656, test_abs_avg=15.830421447753906
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4434308111667633, max_abs=4.5, mean_rel=0.14155876636505127, max_rel=833.5518188476562, norm_rel=0.022840023040771484, ref_abs_avg=19.4986572265625, test_abs_avg=19.498531341552734
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.40243417024612427, max_abs=4.0, mean_rel=0.20076802372932434, max_rel=1078.125, norm_rel=0.020727964118123055, ref_abs_avg=19.41327476501465, test_abs_avg=19.4237117767334
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.3124939799308777, max_abs=1.25, mean_rel=0.10929819941520691, max_rel=16.643802642822266, norm_rel=0.019500723108649254, ref_abs_avg=16.588550567626953, test_abs_avg=16.597400665283203
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4149011969566345, max_abs=4.0, mean_rel=0.13658763468265533, max_rel=751.865234375, norm_rel=0.02245386317372322, ref_abs_avg=18.581439971923828, test_abs_avg=18.58224868774414
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.38454312086105347, max_abs=4.0, mean_rel=0.21489714086055756, max_rel=1968.7498779296875, norm_rel=0.021218443289399147, ref_abs_avg=18.304176330566406, test_abs_avg=18.302196502685547
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.31634771823883057, max_abs=1.25, mean_rel=0.05774403735995293, max_rel=2.525390625, norm_rel=0.020432349294424057, ref_abs_avg=15.375414848327637, test_abs_avg=15.385040283203125
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.38699668645858765, max_abs=4.0, mean_rel=0.13078537583351135, max_rel=1159.6160888671875, norm_rel=0.021793948486447334, ref_abs_avg=17.949642181396484, test_abs_avg=17.949771881103516
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.354533314704895, max_abs=4.0, mean_rel=0.16965976357460022, max_rel=1187.5, norm_rel=0.020601456984877586, ref_abs_avg=17.433692932128906, test_abs_avg=17.435121536254883
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.305179238319397, max_abs=1.125, mean_rel=0.06895232945680618, max_rel=6.8295087814331055, norm_rel=0.019986853003501892, ref_abs_avg=15.280427932739258, test_abs_avg=15.258193016052246
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3667683005332947, max_abs=4.09375, mean_rel=0.12382487952709198, max_rel=454.77557373046875, norm_rel=0.02127457596361637, ref_abs_avg=17.541900634765625, test_abs_avg=17.542476654052734
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3178906738758087, max_abs=3.5, mean_rel=0.19734477996826172, max_rel=1062.5, norm_rel=0.01862182468175888, ref_abs_avg=17.173433303833008, test_abs_avg=17.179866790771484
production_forward2 vs paper_forward output: mean_abs=0.0016474537551403046, max_abs=0.03515625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008428329601883888, max_abs=0.34375, mean_rel=0.07201394438743591, max_rel=156.1593017578125, norm_rel=0.01970323920249939, ref_abs_avg=0.4651203751564026, test_abs_avg=0.465143620967865
production_forward2 grad[1] vs paper_forward: mean_abs=7.3262248039245605, max_abs=56.0, mean_rel=0.14497508108615875, max_rel=193.55484008789062, norm_rel=0.020324451848864555, ref_abs_avg=320.385498046875, test_abs_avg=320.4139404296875
production_forward2 grad[2] vs paper_forward: mean_abs=1.2882776260375977, max_abs=4.75, mean_rel=0.1626555174589157, max_rel=26.58014488220215, norm_rel=0.024343719705939293, ref_abs_avg=51.683963775634766, test_abs_avg=51.628509521484375
production_forward2 grad[3] vs paper_forward: mean_abs=1.6319026947021484, max_abs=12.0, mean_rel=0.17638181149959564, max_rel=3421.9482421875, norm_rel=0.02443384937942028, ref_abs_avg=67.28092956542969, test_abs_avg=67.28878784179688
production_forward2 grad[4] vs paper_forward: mean_abs=1.5059657096862793, max_abs=9.5, mean_rel=0.3927660882472992, max_rel=6624.99951171875, norm_rel=0.022865118458867073, ref_abs_avg=66.2819595336914, test_abs_avg=66.28575134277344
production_forward2 grad[5] vs paper_forward: mean_abs=1.078756332397461, max_abs=4.0, mean_rel=0.11450675874948502, max_rel=11.645564079284668, norm_rel=0.023246759548783302, ref_abs_avg=47.24372863769531, test_abs_avg=47.268226623535156
production_forward2 grad[6] vs paper_forward: mean_abs=1.4291366338729858, max_abs=9.25, mean_rel=0.1575835943222046, max_rel=2077.6572265625, norm_rel=0.0242433100938797, ref_abs_avg=59.3345947265625, test_abs_avg=59.343299865722656
production_forward2 grad[7] vs paper_forward: mean_abs=1.311851978302002, max_abs=8.0, mean_rel=0.37080109119415283, max_rel=4125.0, norm_rel=0.022445550188422203, ref_abs_avg=58.82807159423828, test_abs_avg=58.837493896484375
production_forward2 grad[8] vs paper_forward: mean_abs=1.0446711778640747, max_abs=3.75, mean_rel=0.16295680403709412, max_rel=13.94681167602539, norm_rel=0.023762017488479614, ref_abs_avg=43.19041442871094, test_abs_avg=43.270790100097656
production_forward2 grad[9] vs paper_forward: mean_abs=1.2884612083435059, max_abs=11.0, mean_rel=0.16635318100452423, max_rel=2002.807861328125, norm_rel=0.023964600637555122, ref_abs_avg=54.085750579833984, test_abs_avg=54.090171813964844
production_forward2 grad[10] vs paper_forward: mean_abs=1.1773934364318848, max_abs=6.75, mean_rel=0.34480851888656616, max_rel=4031.249755859375, norm_rel=0.022201478481292725, ref_abs_avg=53.34233093261719, test_abs_avg=53.350608825683594
production_forward2 grad[11] vs paper_forward: mean_abs=0.8985996246337891, max_abs=4.625, mean_rel=0.07911416888237, max_rel=5.129215717315674, norm_rel=0.02146511897444725, ref_abs_avg=41.702728271484375, test_abs_avg=41.7352180480957
production_forward2 grad[12] vs paper_forward: mean_abs=1.1851798295974731, max_abs=8.0, mean_rel=0.1514866203069687, max_rel=912.7742919921875, norm_rel=0.023789238184690475, ref_abs_avg=50.153404235839844, test_abs_avg=50.15453338623047
production_forward2 grad[13] vs paper_forward: mean_abs=1.0866889953613281, max_abs=6.25, mean_rel=0.37001290917396545, max_rel=3281.249755859375, norm_rel=0.022082902491092682, ref_abs_avg=49.49165725708008, test_abs_avg=49.49570846557617
production_forward2 grad[14] vs paper_forward: mean_abs=0.8376445770263672, max_abs=3.1875, mean_rel=0.08660368621349335, max_rel=5.926448822021484, norm_rel=0.02173084206879139, ref_abs_avg=38.715599060058594, test_abs_avg=38.69419479370117
production_forward2 grad[15] vs paper_forward: mean_abs=1.1120975017547607, max_abs=7.5, mean_rel=0.1664825975894928, max_rel=1815.382568359375, norm_rel=0.023563271388411522, ref_abs_avg=47.486175537109375, test_abs_avg=47.488304138183594
production_forward2 grad[16] vs paper_forward: mean_abs=1.0219444036483765, max_abs=6.25, mean_rel=0.37130218744277954, max_rel=3249.999755859375, norm_rel=0.021933751180768013, ref_abs_avg=46.840919494628906, test_abs_avg=46.83921432495117
production_forward2 grad[17] vs paper_forward: mean_abs=0.8374319076538086, max_abs=3.25, mean_rel=0.14589127898216248, max_rel=12.770248413085938, norm_rel=0.023135628551244736, ref_abs_avg=36.10573959350586, test_abs_avg=36.1253662109375
production_forward2 grad[18] vs paper_forward: mean_abs=1.0474050045013428, max_abs=7.0, mean_rel=0.15620866417884827, max_rel=2198.346435546875, norm_rel=0.023465843871235847, ref_abs_avg=44.93026351928711, test_abs_avg=44.93057632446289
production_forward2 grad[19] vs paper_forward: mean_abs=0.9562181234359741, max_abs=5.75, mean_rel=0.3114278316497803, max_rel=2999.999755859375, norm_rel=0.02180556207895279, ref_abs_avg=44.11818313598633, test_abs_avg=44.12123107910156
production_forward2 grad[20] vs paper_forward: mean_abs=0.7504551410675049, max_abs=2.5, mean_rel=0.08024363219738007, max_rel=7.2780866622924805, norm_rel=0.02143731527030468, ref_abs_avg=35.08606719970703, test_abs_avg=35.06700134277344
production_forward2 grad[21] vs paper_forward: mean_abs=0.9881983995437622, max_abs=7.25, mean_rel=0.15158414840698242, max_rel=1360.4080810546875, norm_rel=0.023435518145561218, ref_abs_avg=42.40650177001953, test_abs_avg=42.41136169433594
production_forward2 grad[22] vs paper_forward: mean_abs=0.9006907939910889, max_abs=5.84375, mean_rel=0.26940709352493286, max_rel=3218.749755859375, norm_rel=0.021918637678027153, ref_abs_avg=41.304840087890625, test_abs_avg=41.30406951904297
production_forward2 grad[23] vs paper_forward: mean_abs=0.6862545013427734, max_abs=2.53125, mean_rel=0.07410527765750885, max_rel=6.629757881164551, norm_rel=0.02092505805194378, ref_abs_avg=33.27883529663086, test_abs_avg=33.244483947753906
production_forward2 grad[24] vs paper_forward: mean_abs=0.9420225024223328, max_abs=7.0, mean_rel=0.15024858713150024, max_rel=1564.4876708984375, norm_rel=0.02326606586575508, ref_abs_avg=40.760215759277344, test_abs_avg=40.76215362548828
production_forward2 grad[25] vs paper_forward: mean_abs=0.8653088808059692, max_abs=5.09375, mean_rel=0.30684587359428406, max_rel=2500.0, norm_rel=0.021761151030659676, ref_abs_avg=39.939151763916016, test_abs_avg=39.94526672363281
production_forward2 grad[26] vs paper_forward: mean_abs=0.8560327291488647, max_abs=3.46875, mean_rel=0.08107890188694, max_rel=3.714089870452881, norm_rel=0.02375800907611847, ref_abs_avg=36.96164321899414, test_abs_avg=36.929447174072266
production_forward2 grad[27] vs paper_forward: mean_abs=1.1045684814453125, max_abs=8.0, mean_rel=0.17695939540863037, max_rel=2678.521240234375, norm_rel=0.025094211101531982, ref_abs_avg=44.319976806640625, test_abs_avg=44.32276916503906
production_forward2 grad[28] vs paper_forward: mean_abs=1.0201131105422974, max_abs=7.0, mean_rel=0.34505540132522583, max_rel=3187.499755859375, norm_rel=0.02339993044734001, ref_abs_avg=43.80104446411133, test_abs_avg=43.807701110839844
production_forward2 grad[29] vs paper_forward: mean_abs=0.8280737400054932, max_abs=3.0, mean_rel=0.16474369168281555, max_rel=20.96889305114746, norm_rel=0.026967329904437065, ref_abs_avg=30.712608337402344, test_abs_avg=30.687854766845703
production_forward2 grad[30] vs paper_forward: mean_abs=1.0255470275878906, max_abs=7.5, mean_rel=0.16216567158699036, max_rel=1291.0137939453125, norm_rel=0.025425592437386513, ref_abs_avg=40.515968322753906, test_abs_avg=40.52092742919922
production_forward2 grad[31] vs paper_forward: mean_abs=0.9610975980758667, max_abs=6.65625, mean_rel=0.3761042654514313, max_rel=4156.25, norm_rel=0.02390921674668789, ref_abs_avg=40.3096923828125, test_abs_avg=40.317344665527344
production_forward2 grad[32] vs paper_forward: mean_abs=0.727543830871582, max_abs=3.0, mean_rel=0.09235252439975739, max_rel=5.418854713439941, norm_rel=0.024433063343167305, ref_abs_avg=29.953933715820312, test_abs_avg=29.899127960205078
production_forward2 grad[33] vs paper_forward: mean_abs=0.9455820322036743, max_abs=6.0, mean_rel=0.16270336508750916, max_rel=921.736083984375, norm_rel=0.025213666260242462, ref_abs_avg=37.66570281982422, test_abs_avg=37.668601989746094
production_forward2 grad[34] vs paper_forward: mean_abs=0.8843206167221069, max_abs=5.25, mean_rel=0.3287442922592163, max_rel=3499.999755859375, norm_rel=0.02371043525636196, ref_abs_avg=37.35264587402344, test_abs_avg=37.360198974609375
production_forward2 grad[35] vs paper_forward: mean_abs=0.6696498394012451, max_abs=2.75, mean_rel=0.0813673585653305, max_rel=4.923919200897217, norm_rel=0.023134900256991386, ref_abs_avg=29.43256950378418, test_abs_avg=29.420875549316406
production_forward2 grad[36] vs paper_forward: mean_abs=0.8947149515151978, max_abs=6.0, mean_rel=0.16922146081924438, max_rel=1642.531494140625, norm_rel=0.024983569979667664, ref_abs_avg=35.95185089111328, test_abs_avg=35.949337005615234
production_forward2 grad[37] vs paper_forward: mean_abs=0.830243706703186, max_abs=5.25, mean_rel=0.2766008675098419, max_rel=2437.5, norm_rel=0.023580366745591164, ref_abs_avg=35.345184326171875, test_abs_avg=35.343482971191406
production_forward2 grad[38] vs paper_forward: mean_abs=0.6453170776367188, max_abs=2.625, mean_rel=0.09498566389083862, max_rel=5.693464279174805, norm_rel=0.022795790806412697, ref_abs_avg=28.12024688720703, test_abs_avg=28.173126220703125
production_forward2 grad[39] vs paper_forward: mean_abs=0.8376234769821167, max_abs=7.0, mean_rel=0.1655673235654831, max_rel=1326.944580078125, norm_rel=0.024832382798194885, ref_abs_avg=33.833045959472656, test_abs_avg=33.83302307128906
production_forward2 grad[40] vs paper_forward: mean_abs=0.7814157009124756, max_abs=4.5, mean_rel=0.2633667588233948, max_rel=2093.75, norm_rel=0.02352430485188961, ref_abs_avg=33.315223693847656, test_abs_avg=33.31623840332031
production_forward2 grad[41] vs paper_forward: mean_abs=0.6175985336303711, max_abs=2.125, mean_rel=0.20230931043624878, max_rel=55.76170349121094, norm_rel=0.022559236735105515, ref_abs_avg=27.063831329345703, test_abs_avg=27.107812881469727
production_forward2 grad[42] vs paper_forward: mean_abs=0.8003364205360413, max_abs=6.5, mean_rel=0.1571660339832306, max_rel=1068.02880859375, norm_rel=0.024635324254631996, ref_abs_avg=32.57779312133789, test_abs_avg=32.57615280151367
production_forward2 grad[43] vs paper_forward: mean_abs=0.7403382658958435, max_abs=4.5, mean_rel=0.30976760387420654, max_rel=2765.624755859375, norm_rel=0.022947203367948532, ref_abs_avg=32.28892517089844, test_abs_avg=32.28030776977539
production_forward2 grad[44] vs paper_forward: mean_abs=0.5814127922058105, max_abs=2.25, mean_rel=0.10288812220096588, max_rel=8.952533721923828, norm_rel=0.022517090663313866, ref_abs_avg=26.719587326049805, test_abs_avg=26.770750045776367
production_forward2 grad[45] vs paper_forward: mean_abs=0.7560529708862305, max_abs=5.0, mean_rel=0.1513306349515915, max_rel=684.4465942382812, norm_rel=0.02422325126826763, ref_abs_avg=31.260786056518555, test_abs_avg=31.259994506835938
production_forward2 grad[46] vs paper_forward: mean_abs=0.7043888568878174, max_abs=5.25, mean_rel=0.27185317873954773, max_rel=1937.4998779296875, norm_rel=0.02304450049996376, ref_abs_avg=30.66269874572754, test_abs_avg=30.66488265991211
production_forward2 grad[47] vs paper_forward: mean_abs=0.5533537864685059, max_abs=2.03125, mean_rel=0.07784765958786011, max_rel=4.218051433563232, norm_rel=0.02203396148979664, ref_abs_avg=24.981801986694336, test_abs_avg=24.948165893554688
production_forward2 grad[48] vs paper_forward: mean_abs=0.7258870601654053, max_abs=6.0, mean_rel=0.15661367774009705, max_rel=1399.5848388671875, norm_rel=0.024077527225017548, ref_abs_avg=30.215709686279297, test_abs_avg=30.217084884643555
production_forward2 grad[49] vs paper_forward: mean_abs=0.672822117805481, max_abs=4.75, mean_rel=0.25545457005500793, max_rel=2406.25, norm_rel=0.022801073268055916, ref_abs_avg=29.586139678955078, test_abs_avg=29.582496643066406
production_forward2 grad[50] vs paper_forward: mean_abs=0.6031222343444824, max_abs=2.48828125, mean_rel=0.38381752371788025, max_rel=143.87554931640625, norm_rel=0.024697665125131607, ref_abs_avg=24.865428924560547, test_abs_avg=24.854049682617188
production_forward2 grad[51] vs paper_forward: mean_abs=0.8065313100814819, max_abs=7.0, mean_rel=0.17791393399238586, max_rel=1243.5694580078125, norm_rel=0.026015833020210266, ref_abs_avg=31.09296226501465, test_abs_avg=31.093284606933594
production_forward2 grad[52] vs paper_forward: mean_abs=0.7486549019813538, max_abs=4.5, mean_rel=0.2937788963317871, max_rel=2781.249755859375, norm_rel=0.02427602931857109, ref_abs_avg=30.901851654052734, test_abs_avg=30.894847869873047
production_forward2 grad[53] vs paper_forward: mean_abs=0.5811636447906494, max_abs=2.75, mean_rel=0.1305004358291626, max_rel=11.940048217773438, norm_rel=0.026490533724427223, ref_abs_avg=22.325645446777344, test_abs_avg=22.316669464111328
production_forward2 grad[54] vs paper_forward: mean_abs=0.7304344177246094, max_abs=5.0, mean_rel=0.16625052690505981, max_rel=1378.4970703125, norm_rel=0.02560984157025814, ref_abs_avg=28.594219207763672, test_abs_avg=28.595165252685547
production_forward2 grad[55] vs paper_forward: mean_abs=0.6811214685440063, max_abs=5.1875, mean_rel=0.2719574272632599, max_rel=2250.0, norm_rel=0.023883333429694176, ref_abs_avg=28.61441993713379, test_abs_avg=28.612323760986328
production_forward2 grad[56] vs paper_forward: mean_abs=0.5508749485015869, max_abs=2.0, mean_rel=0.13377299904823303, max_rel=16.809553146362305, norm_rel=0.024485090747475624, ref_abs_avg=22.880050659179688, test_abs_avg=22.809818267822266
production_forward2 grad[57] vs paper_forward: mean_abs=0.6768503785133362, max_abs=6.0, mean_rel=0.16609087586402893, max_rel=1419.639404296875, norm_rel=0.02501632086932659, ref_abs_avg=27.121376037597656, test_abs_avg=27.11888885498047
production_forward2 grad[58] vs paper_forward: mean_abs=0.6294729113578796, max_abs=4.390625, mean_rel=0.2513095438480377, max_rel=2187.5, norm_rel=0.02363649010658264, ref_abs_avg=26.707744598388672, test_abs_avg=26.70911407470703
production_forward2 grad[59] vs paper_forward: mean_abs=0.5275723934173584, max_abs=1.875, mean_rel=0.25033336877822876, max_rel=76.2752914428711, norm_rel=0.025156458839774132, ref_abs_avg=21.591808319091797, test_abs_avg=21.59579849243164
production_forward2 grad[60] vs paper_forward: mean_abs=0.6423983573913574, max_abs=5.0, mean_rel=0.16062989830970764, max_rel=1155.742431640625, norm_rel=0.024485213682055473, ref_abs_avg=26.261505126953125, test_abs_avg=26.26026153564453
production_forward2 grad[61] vs paper_forward: mean_abs=0.5958378314971924, max_abs=4.375, mean_rel=0.2564051151275635, max_rel=1999.9998779296875, norm_rel=0.02325410582125187, ref_abs_avg=25.69861602783203, test_abs_avg=25.69257164001465
production_forward2 grad[62] vs paper_forward: mean_abs=0.4825076460838318, max_abs=2.0, mean_rel=0.24627861380577087, max_rel=60.78620147705078, norm_rel=0.024070478975772858, ref_abs_avg=19.661205291748047, test_abs_avg=19.692310333251953
production_forward2 grad[63] vs paper_forward: mean_abs=0.616299033164978, max_abs=5.0, mean_rel=0.15852808952331543, max_rel=1235.43603515625, norm_rel=0.02420930564403534, ref_abs_avg=25.436874389648438, test_abs_avg=25.43368148803711
production_forward2 grad[64] vs paper_forward: mean_abs=0.5647894740104675, max_abs=4.0, mean_rel=0.2505607306957245, max_rel=2062.5, norm_rel=0.022594233974814415, ref_abs_avg=25.004112243652344, test_abs_avg=25.010208129882812
production_forward2 grad[65] vs paper_forward: mean_abs=0.4663780927658081, max_abs=1.78125, mean_rel=0.07190832495689392, max_rel=2.404695510864258, norm_rel=0.02286640554666519, ref_abs_avg=19.962997436523438, test_abs_avg=19.93768882751465
production_forward2 grad[66] vs paper_forward: mean_abs=0.5769692659378052, max_abs=4.5, mean_rel=0.1504892259836197, max_rel=944.3842163085938, norm_rel=0.024127911776304245, ref_abs_avg=23.94095802307129, test_abs_avg=23.94094467163086
production_forward2 grad[67] vs paper_forward: mean_abs=0.537763237953186, max_abs=3.9609375, mean_rel=0.2445511817932129, max_rel=2343.75, norm_rel=0.02252148650586605, ref_abs_avg=23.858699798583984, test_abs_avg=23.868202209472656
production_forward2 grad[68] vs paper_forward: mean_abs=0.4235954284667969, max_abs=2.25, mean_rel=0.11633986979722977, max_rel=9.852572441101074, norm_rel=0.02172701247036457, ref_abs_avg=19.804508209228516, test_abs_avg=19.827680587768555
production_forward2 grad[69] vs paper_forward: mean_abs=0.5540332794189453, max_abs=5.0, mean_rel=0.1493026167154312, max_rel=1004.4124145507812, norm_rel=0.023587137460708618, ref_abs_avg=23.480792999267578, test_abs_avg=23.48080825805664
production_forward2 grad[70] vs paper_forward: mean_abs=0.5077829360961914, max_abs=3.5, mean_rel=0.23198586702346802, max_rel=1687.4998779296875, norm_rel=0.02214641310274601, ref_abs_avg=22.937740325927734, test_abs_avg=22.929990768432617
production_forward2 grad[71] vs paper_forward: mean_abs=0.4130287170410156, max_abs=1.5, mean_rel=0.08166145533323288, max_rel=4.0108137130737305, norm_rel=0.021422214806079865, ref_abs_avg=19.191770553588867, test_abs_avg=19.19244956970215
production_forward2 grad[72] vs paper_forward: mean_abs=0.5276318192481995, max_abs=4.0, mean_rel=0.14682459831237793, max_rel=627.5460205078125, norm_rel=0.02344352751970291, ref_abs_avg=22.533653259277344, test_abs_avg=22.53409767150879
production_forward2 grad[73] vs paper_forward: mean_abs=0.4885445237159729, max_abs=3.5, mean_rel=0.19970865547657013, max_rel=1937.4998779296875, norm_rel=0.02169468253850937, ref_abs_avg=22.416643142700195, test_abs_avg=22.415807723999023
production_forward2 grad[74] vs paper_forward: mean_abs=0.45046544075012207, max_abs=1.8984375, mean_rel=0.10900837928056717, max_rel=6.571330547332764, norm_rel=0.02341669611632824, ref_abs_avg=19.574636459350586, test_abs_avg=19.59421157836914
production_forward2 grad[75] vs paper_forward: mean_abs=0.566820502281189, max_abs=4.5, mean_rel=0.15943077206611633, max_rel=1102.84521484375, norm_rel=0.024723615497350693, ref_abs_avg=22.94317054748535, test_abs_avg=22.942108154296875
production_forward2 grad[76] vs paper_forward: mean_abs=0.5267156362533569, max_abs=4.0, mean_rel=0.19967171549797058, max_rel=1187.5, norm_rel=0.023156261071562767, ref_abs_avg=22.753219604492188, test_abs_avg=22.746570587158203
production_forward2 grad[77] vs paper_forward: mean_abs=0.4058809280395508, max_abs=1.5, mean_rel=0.0755014568567276, max_rel=3.915604829788208, norm_rel=0.023478388786315918, ref_abs_avg=17.500629425048828, test_abs_avg=17.523151397705078
production_forward2 grad[78] vs paper_forward: mean_abs=0.5273388624191284, max_abs=5.5, mean_rel=0.14597705006599426, max_rel=563.022216796875, norm_rel=0.024184223264455795, ref_abs_avg=21.844898223876953, test_abs_avg=21.846359252929688
production_forward2 grad[79] vs paper_forward: mean_abs=0.4879382848739624, max_abs=4.0, mean_rel=0.19845205545425415, max_rel=1687.4998779296875, norm_rel=0.02235637791454792, ref_abs_avg=21.78752899169922, test_abs_avg=21.785842895507812
production_forward2 grad[80] vs paper_forward: mean_abs=0.3930702209472656, max_abs=1.75, mean_rel=0.09016318619251251, max_rel=6.175782680511475, norm_rel=0.021997764706611633, ref_abs_avg=18.13764762878418, test_abs_avg=18.141326904296875
production_forward2 grad[81] vs paper_forward: mean_abs=0.498077392578125, max_abs=4.0, mean_rel=0.1490194946527481, max_rel=895.39599609375, norm_rel=0.02377537451684475, ref_abs_avg=21.003768920898438, test_abs_avg=21.00454330444336
production_forward2 grad[82] vs paper_forward: mean_abs=0.45533454418182373, max_abs=4.0, mean_rel=0.20442670583724976, max_rel=1562.4998779296875, norm_rel=0.021871354430913925, ref_abs_avg=20.84107208251953, test_abs_avg=20.846759796142578
production_forward2 grad[83] vs paper_forward: mean_abs=0.363747239112854, max_abs=1.5, mean_rel=0.07057879120111465, max_rel=3.259699821472168, norm_rel=0.02081008441746235, ref_abs_avg=17.781660079956055, test_abs_avg=17.777080535888672
production_forward2 grad[84] vs paper_forward: mean_abs=0.47292453050613403, max_abs=5.25, mean_rel=0.1514979898929596, max_rel=834.5006713867188, norm_rel=0.023059675469994545, ref_abs_avg=20.543861389160156, test_abs_avg=20.54256820678711
production_forward2 grad[85] vs paper_forward: mean_abs=0.42671847343444824, max_abs=5.1875, mean_rel=0.20546123385429382, max_rel=2140.625, norm_rel=0.021415051072835922, ref_abs_avg=20.009241104125977, test_abs_avg=20.0081787109375
production_forward2 grad[86] vs paper_forward: mean_abs=0.36152589321136475, max_abs=1.625, mean_rel=0.25555458664894104, max_rel=55.377784729003906, norm_rel=0.022831236943602562, ref_abs_avg=15.832069396972656, test_abs_avg=15.84034538269043
production_forward2 grad[87] vs paper_forward: mean_abs=0.4418744444847107, max_abs=4.5, mean_rel=0.139777272939682, max_rel=927.7066650390625, norm_rel=0.022753475233912468, ref_abs_avg=19.4986572265625, test_abs_avg=19.49911880493164
production_forward2 grad[88] vs paper_forward: mean_abs=0.3966979384422302, max_abs=4.0, mean_rel=0.1839245706796646, max_rel=1531.2498779296875, norm_rel=0.020392367616295815, ref_abs_avg=19.41327476501465, test_abs_avg=19.42256736755371
production_forward2 grad[89] vs paper_forward: mean_abs=0.3174886703491211, max_abs=1.375, mean_rel=0.12244488298892975, max_rel=19.09549903869629, norm_rel=0.019993338733911514, ref_abs_avg=16.588550567626953, test_abs_avg=16.595319747924805
production_forward2 grad[90] vs paper_forward: mean_abs=0.4129825830459595, max_abs=4.0, mean_rel=0.13357841968536377, max_rel=674.7901611328125, norm_rel=0.022366316989064217, ref_abs_avg=18.581439971923828, test_abs_avg=18.5816593170166
production_forward2 grad[91] vs paper_forward: mean_abs=0.38649195432662964, max_abs=3.5, mean_rel=0.20815891027450562, max_rel=1937.4998779296875, norm_rel=0.021312372758984566, ref_abs_avg=18.304176330566406, test_abs_avg=18.299877166748047
production_forward2 grad[92] vs paper_forward: mean_abs=0.32126712799072266, max_abs=1.25, mean_rel=0.06913649290800095, max_rel=5.020767688751221, norm_rel=0.020588209852576256, ref_abs_avg=15.375414848327637, test_abs_avg=15.38524055480957
production_forward2 grad[93] vs paper_forward: mean_abs=0.38621026277542114, max_abs=4.5, mean_rel=0.13163869082927704, max_rel=1197.6424560546875, norm_rel=0.021748535335063934, ref_abs_avg=17.949642181396484, test_abs_avg=17.950010299682617
production_forward2 grad[94] vs paper_forward: mean_abs=0.3538472652435303, max_abs=3.5, mean_rel=0.17387117445468903, max_rel=1593.7498779296875, norm_rel=0.020577700808644295, ref_abs_avg=17.433692932128906, test_abs_avg=17.435222625732422
production_forward2 grad[95] vs paper_forward: mean_abs=0.29250383377075195, max_abs=1.0703125, mean_rel=0.06363518536090851, max_rel=4.002863883972168, norm_rel=0.019430525600910187, ref_abs_avg=15.280427932739258, test_abs_avg=15.259977340698242
production_forward2 grad[96] vs paper_forward: mean_abs=0.36681807041168213, max_abs=3.75, mean_rel=0.12579841911792755, max_rel=480.15863037109375, norm_rel=0.021260114386677742, ref_abs_avg=17.541900634765625, test_abs_avg=17.542400360107422
production_forward2 grad[97] vs paper_forward: mean_abs=0.3211383819580078, max_abs=3.5, mean_rel=0.1994909644126892, max_rel=1468.7498779296875, norm_rel=0.018813351169228554, ref_abs_avg=17.173433303833008, test_abs_avg=17.18271255493164
identity layers + randn queries
production_forward2 fwd+bwd:  113.544 ms
production_forward2 bwd-only: 95.892 ms
production_forward2 peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=2.328 GiB, fwd+bwd=10.328 GiB
production_forward fwd+bwd:  116.545 ms
production_forward bwd-only: 95.973 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.328 GiB, fwd+bwd=10.328 GiB
paper_forward fwd+bwd:  382.294 ms
paper_forward bwd-only: 302.023 ms
paper_forward peak allocated: fwd=29.707 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.744 GiB, fwd+bwd=32.494 GiB
torch_compile_phases_forward fwd+bwd:  166.635 ms
torch_compile_phases_forward bwd-only: 132.692 ms
torch_compile_phases_forward peak allocated: fwd=12.782 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.098 GiB, fwd+bwd=17.350 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016669135075062513, max_abs=0.0390625
production_forward grad[0] vs paper_forward: mean_abs=0.00853512343019247, max_abs=0.328125, mean_rel=0.07275441288948059, max_rel=118.16667938232422, norm_rel=0.01988513395190239, ref_abs_avg=0.4654313921928406, test_abs_avg=0.46544694900512695
production_forward grad[1] vs paper_forward: mean_abs=7.315954208374023, max_abs=56.0, mean_rel=0.19792263209819794, max_rel=470.15850830078125, norm_rel=0.020648173987865448, ref_abs_avg=319.9287414550781, test_abs_avg=319.9449157714844
production_forward grad[2] vs paper_forward: mean_abs=1.1715831756591797, max_abs=5.0, mean_rel=0.08071266114711761, max_rel=5.195686340332031, norm_rel=0.020948385819792747, ref_abs_avg=56.64007568359375, test_abs_avg=56.559329986572266
production_forward grad[3] vs paper_forward: mean_abs=1.6324841976165771, max_abs=13.0, mean_rel=0.17362448573112488, max_rel=1606.0982666015625, norm_rel=0.024447740986943245, ref_abs_avg=67.20026397705078, test_abs_avg=67.20258331298828
production_forward grad[4] vs paper_forward: mean_abs=1.5074924230575562, max_abs=10.0, mean_rel=0.42471200227737427, max_rel=3687.499755859375, norm_rel=0.022741813212633133, ref_abs_avg=66.6716537475586, test_abs_avg=66.67332458496094
production_forward grad[5] vs paper_forward: mean_abs=1.1614618301391602, max_abs=5.0, mean_rel=0.15097510814666748, max_rel=16.884065628051758, norm_rel=0.02377619594335556, ref_abs_avg=49.247413635253906, test_abs_avg=49.222599029541016
production_forward grad[6] vs paper_forward: mean_abs=1.442260503768921, max_abs=10.0, mean_rel=0.16211509704589844, max_rel=2123.394775390625, norm_rel=0.02431434951722622, ref_abs_avg=59.72316360473633, test_abs_avg=59.72681427001953
production_forward grad[7] vs paper_forward: mean_abs=1.3231115341186523, max_abs=8.5, mean_rel=0.33754968643188477, max_rel=3843.749755859375, norm_rel=0.02256743796169758, ref_abs_avg=58.916107177734375, test_abs_avg=58.92334747314453
production_forward grad[8] vs paper_forward: mean_abs=1.0386962890625, max_abs=3.5, mean_rel=0.07552521675825119, max_rel=3.4408793449401855, norm_rel=0.023465298116207123, ref_abs_avg=43.57536315917969, test_abs_avg=43.670799255371094
production_forward grad[9] vs paper_forward: mean_abs=1.3014707565307617, max_abs=9.0, mean_rel=0.15843695402145386, max_rel=2493.9541015625, norm_rel=0.024102626368403435, ref_abs_avg=54.33781051635742, test_abs_avg=54.34197235107422
production_forward grad[10] vs paper_forward: mean_abs=1.20188570022583, max_abs=7.1875, mean_rel=0.3542039394378662, max_rel=3593.749755859375, norm_rel=0.022479576990008354, ref_abs_avg=53.723018646240234, test_abs_avg=53.72393798828125
production_forward grad[11] vs paper_forward: mean_abs=0.9284853935241699, max_abs=3.25, mean_rel=0.0771176815032959, max_rel=4.237642765045166, norm_rel=0.02153155393898487, ref_abs_avg=43.714351654052734, test_abs_avg=43.67515563964844
production_forward grad[12] vs paper_forward: mean_abs=1.2131601572036743, max_abs=9.0, mean_rel=0.1635374128818512, max_rel=1428.9847412109375, norm_rel=0.023976240307092667, ref_abs_avg=50.97454071044922, test_abs_avg=50.975685119628906
production_forward grad[13] vs paper_forward: mean_abs=1.121347427368164, max_abs=7.75, mean_rel=0.3349427878856659, max_rel=3703.124755859375, norm_rel=0.02226077765226364, ref_abs_avg=50.59917449951172, test_abs_avg=50.60137939453125
production_forward grad[14] vs paper_forward: mean_abs=0.8880128860473633, max_abs=3.875, mean_rel=0.07292685657739639, max_rel=8.271524429321289, norm_rel=0.023034842684864998, ref_abs_avg=40.53213882446289, test_abs_avg=40.532196044921875
production_forward grad[15] vs paper_forward: mean_abs=1.1339435577392578, max_abs=7.5, mean_rel=0.16717983782291412, max_rel=1369.847412109375, norm_rel=0.023843467235565186, ref_abs_avg=47.8802375793457, test_abs_avg=47.88286590576172
production_forward grad[16] vs paper_forward: mean_abs=1.0360124111175537, max_abs=6.0, mean_rel=0.315437376499176, max_rel=3593.749755859375, norm_rel=0.022096335887908936, ref_abs_avg=47.19331359863281, test_abs_avg=47.19902420043945
production_forward grad[17] vs paper_forward: mean_abs=0.7929668426513672, max_abs=3.375, mean_rel=0.11131909489631653, max_rel=27.469161987304688, norm_rel=0.022447485476732254, ref_abs_avg=36.763729095458984, test_abs_avg=36.786930084228516
production_forward grad[18] vs paper_forward: mean_abs=1.0698645114898682, max_abs=7.0, mean_rel=0.15809427201747894, max_rel=1738.0596923828125, norm_rel=0.02353610470890999, ref_abs_avg=45.72034454345703, test_abs_avg=45.7225227355957
production_forward grad[19] vs paper_forward: mean_abs=0.9845353364944458, max_abs=6.0, mean_rel=0.2778342068195343, max_rel=3624.999755859375, norm_rel=0.02219739556312561, ref_abs_avg=44.49462127685547, test_abs_avg=44.49794006347656
production_forward grad[20] vs paper_forward: mean_abs=0.7826030254364014, max_abs=3.25, mean_rel=0.17685049772262573, max_rel=39.63286590576172, norm_rel=0.023162100464105606, ref_abs_avg=34.03915023803711, test_abs_avg=33.97290802001953
production_forward grad[21] vs paper_forward: mean_abs=1.0076411962509155, max_abs=7.0, mean_rel=0.1648237258195877, max_rel=2103.639404296875, norm_rel=0.0235383752733469, ref_abs_avg=43.072933197021484, test_abs_avg=43.07524108886719
production_forward grad[22] vs paper_forward: mean_abs=0.9284993410110474, max_abs=5.53125, mean_rel=0.2345193475484848, max_rel=2593.749755859375, norm_rel=0.021728098392486572, ref_abs_avg=42.83879852294922, test_abs_avg=42.840362548828125
production_forward grad[23] vs paper_forward: mean_abs=0.7548971176147461, max_abs=3.25, mean_rel=0.17094337940216064, max_rel=28.72235679626465, norm_rel=0.023250292986631393, ref_abs_avg=33.102813720703125, test_abs_avg=33.055137634277344
production_forward grad[24] vs paper_forward: mean_abs=0.9567643404006958, max_abs=6.0, mean_rel=0.15809938311576843, max_rel=1334.01318359375, norm_rel=0.023312877863645554, ref_abs_avg=41.27278137207031, test_abs_avg=41.275230407714844
production_forward grad[25] vs paper_forward: mean_abs=0.8805087804794312, max_abs=5.5, mean_rel=0.3187519311904907, max_rel=3374.999755859375, norm_rel=0.02193853072822094, ref_abs_avg=40.23228454589844, test_abs_avg=40.2316780090332
production_forward grad[26] vs paper_forward: mean_abs=0.8330936431884766, max_abs=3.4375, mean_rel=0.07897202670574188, max_rel=2.9664340019226074, norm_rel=0.023137139156460762, ref_abs_avg=35.357765197753906, test_abs_avg=35.42512512207031
production_forward grad[27] vs paper_forward: mean_abs=1.098006010055542, max_abs=8.0, mean_rel=0.17509377002716064, max_rel=1946.816650390625, norm_rel=0.02530018426477909, ref_abs_avg=43.664093017578125, test_abs_avg=43.66883850097656
production_forward grad[28] vs paper_forward: mean_abs=1.0182713270187378, max_abs=7.125, mean_rel=0.4024128317832947, max_rel=3874.999755859375, norm_rel=0.02371750958263874, ref_abs_avg=43.16150665283203, test_abs_avg=43.16443634033203
production_forward grad[29] vs paper_forward: mean_abs=0.738837718963623, max_abs=3.5078125, mean_rel=0.09170801937580109, max_rel=9.660316467285156, norm_rel=0.023675519973039627, ref_abs_avg=31.273643493652344, test_abs_avg=31.318744659423828
production_forward grad[30] vs paper_forward: mean_abs=1.0211352109909058, max_abs=6.5, mean_rel=0.17309196293354034, max_rel=1249.671875, norm_rel=0.025625333189964294, ref_abs_avg=40.039222717285156, test_abs_avg=40.040626525878906
production_forward grad[31] vs paper_forward: mean_abs=0.9413976669311523, max_abs=5.0, mean_rel=0.2741590440273285, max_rel=2624.999755859375, norm_rel=0.02387413941323757, ref_abs_avg=39.54896545410156, test_abs_avg=39.552528381347656
production_forward grad[32] vs paper_forward: mean_abs=0.740424394607544, max_abs=3.25, mean_rel=0.12301896512508392, max_rel=29.477663040161133, norm_rel=0.023914724588394165, ref_abs_avg=30.941232681274414, test_abs_avg=30.906097412109375
production_forward grad[33] vs paper_forward: mean_abs=0.9423919916152954, max_abs=6.0, mean_rel=0.18207649886608124, max_rel=2124.173095703125, norm_rel=0.02550695277750492, ref_abs_avg=37.13690185546875, test_abs_avg=37.13824462890625
production_forward grad[34] vs paper_forward: mean_abs=0.886747419834137, max_abs=5.625, mean_rel=0.32521411776542664, max_rel=3406.249755859375, norm_rel=0.024171750992536545, ref_abs_avg=36.84397888183594, test_abs_avg=36.849884033203125
production_forward grad[35] vs paper_forward: mean_abs=0.6631375551223755, max_abs=2.5, mean_rel=0.10931558907032013, max_rel=9.671998977661133, norm_rel=0.023800622671842575, ref_abs_avg=28.216318130493164, test_abs_avg=28.211002349853516
production_forward grad[36] vs paper_forward: mean_abs=0.886602520942688, max_abs=6.5, mean_rel=0.16955220699310303, max_rel=1245.3951416015625, norm_rel=0.0252460278570652, ref_abs_avg=35.268272399902344, test_abs_avg=35.269203186035156
production_forward grad[37] vs paper_forward: mean_abs=0.8275812268257141, max_abs=5.0625, mean_rel=0.26456719636917114, max_rel=2874.999755859375, norm_rel=0.02385522797703743, ref_abs_avg=34.84557342529297, test_abs_avg=34.847747802734375
production_forward grad[38] vs paper_forward: mean_abs=0.673591136932373, max_abs=2.5, mean_rel=0.1846342533826828, max_rel=25.61161994934082, norm_rel=0.025080420076847076, ref_abs_avg=27.334678649902344, test_abs_avg=27.319164276123047
production_forward grad[39] vs paper_forward: mean_abs=0.8427702784538269, max_abs=6.0, mean_rel=0.1679590344429016, max_rel=1930.55810546875, norm_rel=0.024988938122987747, ref_abs_avg=33.84705352783203, test_abs_avg=33.849822998046875
production_forward grad[40] vs paper_forward: mean_abs=0.7868428230285645, max_abs=5.0, mean_rel=0.3094635605812073, max_rel=2796.874755859375, norm_rel=0.023739395663142204, ref_abs_avg=33.22760009765625, test_abs_avg=33.230323791503906
production_forward grad[41] vs paper_forward: mean_abs=0.6226787567138672, max_abs=2.25, mean_rel=0.1154274046421051, max_rel=10.110129356384277, norm_rel=0.02412497065961361, ref_abs_avg=26.23602867126465, test_abs_avg=26.285572052001953
production_forward grad[42] vs paper_forward: mean_abs=0.8004120588302612, max_abs=5.25, mean_rel=0.16554437577724457, max_rel=1201.5755615234375, norm_rel=0.024785107001662254, ref_abs_avg=32.43714141845703, test_abs_avg=32.43812561035156
production_forward grad[43] vs paper_forward: mean_abs=0.7429571151733398, max_abs=4.53125, mean_rel=0.2636144161224365, max_rel=2531.25, norm_rel=0.02339812181890011, ref_abs_avg=31.844100952148438, test_abs_avg=31.846445083618164
production_forward grad[44] vs paper_forward: mean_abs=0.598213791847229, max_abs=2.875, mean_rel=0.13845762610435486, max_rel=35.15056610107422, norm_rel=0.023002883419394493, ref_abs_avg=26.662132263183594, test_abs_avg=26.703306198120117
production_forward grad[45] vs paper_forward: mean_abs=0.7643897533416748, max_abs=5.0, mean_rel=0.15625981986522675, max_rel=1271.6075439453125, norm_rel=0.024620916694402695, ref_abs_avg=31.162572860717773, test_abs_avg=31.162946701049805
production_forward grad[46] vs paper_forward: mean_abs=0.7095301747322083, max_abs=4.21875, mean_rel=0.24479907751083374, max_rel=2687.499755859375, norm_rel=0.02307509258389473, ref_abs_avg=30.84880828857422, test_abs_avg=30.85052490234375
production_forward grad[47] vs paper_forward: mean_abs=0.5662860870361328, max_abs=3.0, mean_rel=0.08248640596866608, max_rel=5.009084701538086, norm_rel=0.022111456841230392, ref_abs_avg=25.762104034423828, test_abs_avg=25.79852867126465
production_forward grad[48] vs paper_forward: mean_abs=0.7341008186340332, max_abs=5.0, mean_rel=0.16026364266872406, max_rel=1039.0343017578125, norm_rel=0.024399425834417343, ref_abs_avg=30.173736572265625, test_abs_avg=30.171737670898438
production_forward grad[49] vs paper_forward: mean_abs=0.6806095838546753, max_abs=4.75, mean_rel=0.31969231367111206, max_rel=3093.749755859375, norm_rel=0.022857099771499634, ref_abs_avg=29.87984848022461, test_abs_avg=29.877487182617188
production_forward grad[50] vs paper_forward: mean_abs=0.6385307312011719, max_abs=2.1328125, mean_rel=0.1606094390153885, max_rel=25.97101402282715, norm_rel=0.023488204926252365, ref_abs_avg=27.133922576904297, test_abs_avg=27.13699722290039
production_forward grad[51] vs paper_forward: mean_abs=0.8182997703552246, max_abs=6.0, mean_rel=0.17220525443553925, max_rel=1533.6275634765625, norm_rel=0.025812357664108276, ref_abs_avg=31.811668395996094, test_abs_avg=31.813085556030273
production_forward grad[52] vs paper_forward: mean_abs=0.7617750763893127, max_abs=5.625, mean_rel=0.2795313000679016, max_rel=2937.499755859375, norm_rel=0.024430498480796814, ref_abs_avg=31.342071533203125, test_abs_avg=31.341873168945312
production_forward grad[53] vs paper_forward: mean_abs=0.5827399492263794, max_abs=2.375, mean_rel=0.11288706213235855, max_rel=8.397071838378906, norm_rel=0.025192182511091232, ref_abs_avg=23.65726089477539, test_abs_avg=23.656476974487305
production_forward grad[54] vs paper_forward: mean_abs=0.7519451379776001, max_abs=6.0, mean_rel=0.17227137088775635, max_rel=1684.6673583984375, norm_rel=0.02529107592999935, ref_abs_avg=29.750473022460938, test_abs_avg=29.749767303466797
production_forward grad[55] vs paper_forward: mean_abs=0.6982259750366211, max_abs=4.4375, mean_rel=0.23206178843975067, max_rel=2062.5, norm_rel=0.023992808535695076, ref_abs_avg=29.120899200439453, test_abs_avg=29.11977767944336
production_forward grad[56] vs paper_forward: mean_abs=0.5586104393005371, max_abs=2.4375, mean_rel=0.10561245679855347, max_rel=4.218667507171631, norm_rel=0.02421225607395172, ref_abs_avg=23.821176528930664, test_abs_avg=23.813905715942383
production_forward grad[57] vs paper_forward: mean_abs=0.698794960975647, max_abs=6.0, mean_rel=0.16730600595474243, max_rel=2884.3203125, norm_rel=0.02506374940276146, ref_abs_avg=27.903926849365234, test_abs_avg=27.905742645263672
production_forward grad[58] vs paper_forward: mean_abs=0.6500065326690674, max_abs=4.0, mean_rel=0.2711784839630127, max_rel=2687.499755859375, norm_rel=0.02364262193441391, ref_abs_avg=27.52216911315918, test_abs_avg=27.525083541870117
production_forward grad[59] vs paper_forward: mean_abs=0.5336647033691406, max_abs=2.015625, mean_rel=0.11636392772197723, max_rel=6.058358192443848, norm_rel=0.024149205535650253, ref_abs_avg=22.148502349853516, test_abs_avg=22.13154411315918
production_forward grad[60] vs paper_forward: mean_abs=0.657214879989624, max_abs=5.0, mean_rel=0.15845826268196106, max_rel=1334.6622314453125, norm_rel=0.024485813453793526, ref_abs_avg=26.891775131225586, test_abs_avg=26.88990020751953
production_forward grad[61] vs paper_forward: mean_abs=0.6098542809486389, max_abs=4.5, mean_rel=0.26304128766059875, max_rel=1828.1248779296875, norm_rel=0.023130223155021667, ref_abs_avg=26.389955520629883, test_abs_avg=26.384958267211914
production_forward grad[62] vs paper_forward: mean_abs=0.45924806594848633, max_abs=2.25, mean_rel=0.08859384804964066, max_rel=3.4397799968719482, norm_rel=0.02349761128425598, ref_abs_avg=20.253917694091797, test_abs_avg=20.26657485961914
production_forward grad[63] vs paper_forward: mean_abs=0.6153310537338257, max_abs=5.0, mean_rel=0.15363618731498718, max_rel=708.8580932617188, norm_rel=0.024316363036632538, ref_abs_avg=25.358612060546875, test_abs_avg=25.357179641723633
production_forward grad[64] vs paper_forward: mean_abs=0.5721249580383301, max_abs=4.0, mean_rel=0.2439163774251938, max_rel=2062.5, norm_rel=0.02262524887919426, ref_abs_avg=25.336362838745117, test_abs_avg=25.33876609802246
production_forward grad[65] vs paper_forward: mean_abs=0.46156787872314453, max_abs=1.5625, mean_rel=0.08111472427845001, max_rel=8.545135498046875, norm_rel=0.021952513605356216, ref_abs_avg=21.249574661254883, test_abs_avg=21.247821807861328
production_forward grad[66] vs paper_forward: mean_abs=0.5839056968688965, max_abs=4.5, mean_rel=0.14733052253723145, max_rel=1171.9715576171875, norm_rel=0.02395525574684143, ref_abs_avg=24.401582717895508, test_abs_avg=24.40066146850586
production_forward grad[67] vs paper_forward: mean_abs=0.5455465316772461, max_abs=3.75, mean_rel=0.2455475628376007, max_rel=1812.4998779296875, norm_rel=0.02264793962240219, ref_abs_avg=24.11384391784668, test_abs_avg=24.113203048706055
production_forward grad[68] vs paper_forward: mean_abs=0.45877838134765625, max_abs=1.8125, mean_rel=0.06949815899133682, max_rel=3.51212477684021, norm_rel=0.02411634847521782, ref_abs_avg=19.6739559173584, test_abs_avg=19.66868782043457
production_forward grad[69] vs paper_forward: mean_abs=0.5579206943511963, max_abs=4.5, mean_rel=0.15462900698184967, max_rel=801.5562133789062, norm_rel=0.023589637130498886, ref_abs_avg=23.709604263305664, test_abs_avg=23.708995819091797
production_forward grad[70] vs paper_forward: mean_abs=0.5168499946594238, max_abs=3.5, mean_rel=0.22238695621490479, max_rel=2125.0, norm_rel=0.022142969071865082, ref_abs_avg=23.381969451904297, test_abs_avg=23.38032341003418
production_forward grad[71] vs paper_forward: mean_abs=0.41270875930786133, max_abs=1.6875, mean_rel=0.17017275094985962, max_rel=27.480073928833008, norm_rel=0.02108556032180786, ref_abs_avg=19.550884246826172, test_abs_avg=19.561031341552734
production_forward grad[72] vs paper_forward: mean_abs=0.5290893912315369, max_abs=5.0, mean_rel=0.14688462018966675, max_rel=490.26202392578125, norm_rel=0.02294944040477276, ref_abs_avg=23.057371139526367, test_abs_avg=23.054718017578125
production_forward grad[73] vs paper_forward: mean_abs=0.4886339604854584, max_abs=3.5, mean_rel=0.19448447227478027, max_rel=2093.75, norm_rel=0.021586263552308083, ref_abs_avg=22.639179229736328, test_abs_avg=22.64300537109375
production_forward grad[74] vs paper_forward: mean_abs=0.47684288024902344, max_abs=2.625, mean_rel=0.07732285559177399, max_rel=2.69508957862854, norm_rel=0.02414419874548912, ref_abs_avg=20.083106994628906, test_abs_avg=20.113422393798828
production_forward grad[75] vs paper_forward: mean_abs=0.6081445217132568, max_abs=5.0, mean_rel=0.1702006459236145, max_rel=1224.844482421875, norm_rel=0.024621443822979927, ref_abs_avg=24.72320556640625, test_abs_avg=24.723129272460938
production_forward grad[76] vs paper_forward: mean_abs=0.5579258799552917, max_abs=4.0, mean_rel=0.2175028920173645, max_rel=1390.6248779296875, norm_rel=0.023277467116713524, ref_abs_avg=24.061702728271484, test_abs_avg=24.058849334716797
production_forward grad[77] vs paper_forward: mean_abs=0.4190358519554138, max_abs=1.625, mean_rel=0.33938583731651306, max_rel=127.90215301513672, norm_rel=0.02227860502898693, ref_abs_avg=19.266620635986328, test_abs_avg=19.290271759033203
production_forward grad[78] vs paper_forward: mean_abs=0.5479365587234497, max_abs=4.5, mean_rel=0.1488441824913025, max_rel=769.5045776367188, norm_rel=0.023884007707238197, ref_abs_avg=22.986915588378906, test_abs_avg=22.98688316345215
production_forward grad[79] vs paper_forward: mean_abs=0.503269374370575, max_abs=4.0, mean_rel=0.21087050437927246, max_rel=1484.3748779296875, norm_rel=0.02225486934185028, ref_abs_avg=22.667810440063477, test_abs_avg=22.66936683654785
production_forward grad[80] vs paper_forward: mean_abs=0.3924596309661865, max_abs=1.375, mean_rel=0.11573285609483719, max_rel=11.142383575439453, norm_rel=0.022048303857445717, ref_abs_avg=17.885299682617188, test_abs_avg=17.8787784576416
production_forward grad[81] vs paper_forward: mean_abs=0.5013071298599243, max_abs=4.125, mean_rel=0.1474357545375824, max_rel=826.9426879882812, norm_rel=0.023280968889594078, ref_abs_avg=21.604244232177734, test_abs_avg=21.60378074645996
production_forward grad[82] vs paper_forward: mean_abs=0.4570852816104889, max_abs=3.75, mean_rel=0.18839812278747559, max_rel=968.7499389648438, norm_rel=0.021249547600746155, ref_abs_avg=21.48431396484375, test_abs_avg=21.47393035888672
production_forward grad[83] vs paper_forward: mean_abs=0.3766903877258301, max_abs=1.5, mean_rel=0.09957560151815414, max_rel=9.510940551757812, norm_rel=0.02190745621919632, ref_abs_avg=16.97597312927246, test_abs_avg=16.98925018310547
production_forward grad[84] vs paper_forward: mean_abs=0.4763204753398895, max_abs=4.5, mean_rel=0.15311022102832794, max_rel=1378.76220703125, norm_rel=0.022889593616127968, ref_abs_avg=20.893415451049805, test_abs_avg=20.89423179626465
production_forward grad[85] vs paper_forward: mean_abs=0.4397047460079193, max_abs=3.5, mean_rel=0.21050141751766205, max_rel=1718.7498779296875, norm_rel=0.020761750638484955, ref_abs_avg=21.198217391967773, test_abs_avg=21.187793731689453
production_forward grad[86] vs paper_forward: mean_abs=0.34090256690979004, max_abs=1.5, mean_rel=0.10102387517690659, max_rel=16.06536102294922, norm_rel=0.021383877843618393, ref_abs_avg=16.78297233581543, test_abs_avg=16.778974533081055
production_forward grad[87] vs paper_forward: mean_abs=0.44209688901901245, max_abs=6.5, mean_rel=0.13631457090377808, max_rel=685.87744140625, norm_rel=0.022185523062944412, ref_abs_avg=20.043304443359375, test_abs_avg=20.04409408569336
production_forward grad[88] vs paper_forward: mean_abs=0.40801945328712463, max_abs=3.5, mean_rel=0.18889479339122772, max_rel=1499.9998779296875, norm_rel=0.02091636322438717, ref_abs_avg=19.641517639160156, test_abs_avg=19.64373016357422
production_forward grad[89] vs paper_forward: mean_abs=0.3496055603027344, max_abs=1.375, mean_rel=0.08701904118061066, max_rel=4.010403156280518, norm_rel=0.020604044198989868, ref_abs_avg=16.472702026367188, test_abs_avg=16.474864959716797
production_forward grad[90] vs paper_forward: mean_abs=0.41104522347450256, max_abs=4.0625, mean_rel=0.1306266188621521, max_rel=819.8185424804688, norm_rel=0.021711235865950584, ref_abs_avg=19.084753036499023, test_abs_avg=19.085369110107422
production_forward grad[91] vs paper_forward: mean_abs=0.38505351543426514, max_abs=4.5, mean_rel=0.18138548731803894, max_rel=1749.9998779296875, norm_rel=0.020748905837535858, ref_abs_avg=18.877159118652344, test_abs_avg=18.88994598388672
production_forward grad[92] vs paper_forward: mean_abs=0.3114755153656006, max_abs=1.125, mean_rel=0.11099041253328323, max_rel=12.516861915588379, norm_rel=0.01981445960700512, ref_abs_avg=15.812870025634766, test_abs_avg=15.831107139587402
production_forward grad[93] vs paper_forward: mean_abs=0.3942340016365051, max_abs=4.5, mean_rel=0.13026180863380432, max_rel=750.9842529296875, norm_rel=0.021409302949905396, ref_abs_avg=18.665260314941406, test_abs_avg=18.663063049316406
production_forward grad[94] vs paper_forward: mean_abs=0.3655051589012146, max_abs=3.6875, mean_rel=0.195242241024971, max_rel=1562.4998779296875, norm_rel=0.020016072317957878, ref_abs_avg=18.628549575805664, test_abs_avg=18.621679306030273
production_forward grad[95] vs paper_forward: mean_abs=0.2943287491798401, max_abs=1.25, mean_rel=0.08803673088550568, max_rel=11.550985336303711, norm_rel=0.019332652911543846, ref_abs_avg=15.149154663085938, test_abs_avg=15.16796875
production_forward grad[96] vs paper_forward: mean_abs=0.37332388758659363, max_abs=5.0, mean_rel=0.12686103582382202, max_rel=1096.9521484375, norm_rel=0.02077450044453144, ref_abs_avg=18.29909896850586, test_abs_avg=18.29920196533203
production_forward grad[97] vs paper_forward: mean_abs=0.3378567695617676, max_abs=3.5, mean_rel=0.17717233300209045, max_rel=1999.9998779296875, norm_rel=0.018265342339873314, ref_abs_avg=18.643722534179688, test_abs_avg=18.63864517211914
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016706229653209448, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008883588016033173, max_abs=0.5078125, mean_rel=0.07540250569581985, max_rel=95.54508972167969, norm_rel=0.02058439515531063, ref_abs_avg=0.4654313921928406, test_abs_avg=0.46543532609939575
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.4840826988220215, max_abs=64.5, mean_rel=0.29011744260787964, max_rel=1136.9278564453125, norm_rel=0.02108413353562355, ref_abs_avg=319.9287414550781, test_abs_avg=319.8944091796875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.234715223312378, max_abs=5.25, mean_rel=0.09450194239616394, max_rel=5.972853660583496, norm_rel=0.022537749260663986, ref_abs_avg=56.64007568359375, test_abs_avg=56.591243743896484
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6833432912826538, max_abs=13.0, mean_rel=0.18550212681293488, max_rel=2003.8082275390625, norm_rel=0.02519594505429268, ref_abs_avg=67.20026397705078, test_abs_avg=67.19913482666016
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5574085712432861, max_abs=10.0, mean_rel=0.4411362409591675, max_rel=3562.499755859375, norm_rel=0.023493671789765358, ref_abs_avg=66.6716537475586, test_abs_avg=66.67623901367188
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1858692169189453, max_abs=5.25, mean_rel=0.14135265350341797, max_rel=16.170988082885742, norm_rel=0.024498378857970238, ref_abs_avg=49.247413635253906, test_abs_avg=49.2421875
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4833945035934448, max_abs=9.5, mean_rel=0.1639169454574585, max_rel=2109.69921875, norm_rel=0.025003785267472267, ref_abs_avg=59.72316360473633, test_abs_avg=59.721534729003906
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3699871301651, max_abs=9.0, mean_rel=0.3542972505092621, max_rel=3312.499755859375, norm_rel=0.023360131308436394, ref_abs_avg=58.916107177734375, test_abs_avg=58.91979217529297
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.1041088104248047, max_abs=4.25, mean_rel=0.07889457046985626, max_rel=2.36281156539917, norm_rel=0.025236379355192184, ref_abs_avg=43.57536315917969, test_abs_avg=43.664398193359375
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3378231525421143, max_abs=8.0, mean_rel=0.15919268131256104, max_rel=2320.26025390625, norm_rel=0.02475036308169365, ref_abs_avg=54.33781051635742, test_abs_avg=54.34100341796875
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2394108772277832, max_abs=8.0, mean_rel=0.36229217052459717, max_rel=2921.874755859375, norm_rel=0.02317187748849392, ref_abs_avg=53.723018646240234, test_abs_avg=53.7203369140625
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9773745536804199, max_abs=3.75, mean_rel=0.1179589182138443, max_rel=25.014665603637695, norm_rel=0.022620679810643196, ref_abs_avg=43.714351654052734, test_abs_avg=43.675514221191406
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.246873140335083, max_abs=8.0, mean_rel=0.16891610622406006, max_rel=1971.139404296875, norm_rel=0.02459779940545559, ref_abs_avg=50.97454071044922, test_abs_avg=50.9768180847168
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1517455577850342, max_abs=6.625, mean_rel=0.3520916998386383, max_rel=3124.999755859375, norm_rel=0.022848637774586678, ref_abs_avg=50.59917449951172, test_abs_avg=50.60230255126953
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9387426376342773, max_abs=3.5, mean_rel=0.07137425243854523, max_rel=3.4124927520751953, norm_rel=0.023689061403274536, ref_abs_avg=40.53213882446289, test_abs_avg=40.50663375854492
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1648691892623901, max_abs=9.0, mean_rel=0.1754840612411499, max_rel=1510.0367431640625, norm_rel=0.024477839469909668, ref_abs_avg=47.8802375793457, test_abs_avg=47.88343048095703
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0682449340820312, max_abs=6.25, mean_rel=0.3584391474723816, max_rel=3640.624755859375, norm_rel=0.022763075307011604, ref_abs_avg=47.19331359863281, test_abs_avg=47.20050811767578
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8469133377075195, max_abs=3.5, mean_rel=0.14398859441280365, max_rel=32.33866500854492, norm_rel=0.02322806417942047, ref_abs_avg=36.763729095458984, test_abs_avg=36.73741149902344
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0975446701049805, max_abs=7.5, mean_rel=0.15834063291549683, max_rel=1220.8031005859375, norm_rel=0.0241292342543602, ref_abs_avg=45.72034454345703, test_abs_avg=45.72123718261719
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=1.0083928108215332, max_abs=6.0, mean_rel=0.2853654623031616, max_rel=2874.999755859375, norm_rel=0.02274271659553051, ref_abs_avg=44.49462127685547, test_abs_avg=44.497802734375
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8148059844970703, max_abs=3.0, mean_rel=0.2114364504814148, max_rel=39.21207809448242, norm_rel=0.02411022037267685, ref_abs_avg=34.03915023803711, test_abs_avg=33.97967529296875
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0307881832122803, max_abs=7.0, mean_rel=0.1644238382577896, max_rel=1552.7381591796875, norm_rel=0.024069828912615776, ref_abs_avg=43.072933197021484, test_abs_avg=43.0733757019043
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9513615965843201, max_abs=6.0, mean_rel=0.2421146035194397, max_rel=2703.124755859375, norm_rel=0.02228287048637867, ref_abs_avg=42.83879852294922, test_abs_avg=42.83978271484375
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7749300003051758, max_abs=4.0, mean_rel=0.17214444279670715, max_rel=21.35032844543457, norm_rel=0.024144431576132774, ref_abs_avg=33.102813720703125, test_abs_avg=33.05341720581055
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9769394993782043, max_abs=7.5, mean_rel=0.16553550958633423, max_rel=1551.81591796875, norm_rel=0.023815542459487915, ref_abs_avg=41.27278137207031, test_abs_avg=41.27472686767578
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.9010959267616272, max_abs=5.6875, mean_rel=0.29047077894210815, max_rel=3124.999755859375, norm_rel=0.02244577743113041, ref_abs_avg=40.23228454589844, test_abs_avg=40.231468200683594
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8740463256835938, max_abs=3.4375, mean_rel=0.08255396783351898, max_rel=3.7573657035827637, norm_rel=0.02444002963602543, ref_abs_avg=35.357765197753906, test_abs_avg=35.45948791503906
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1239089965820312, max_abs=8.0, mean_rel=0.18223218619823456, max_rel=2530.84912109375, norm_rel=0.02585652470588684, ref_abs_avg=43.664093017578125, test_abs_avg=43.668067932128906
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0450596809387207, max_abs=7.4375, mean_rel=0.3779832124710083, max_rel=3765.624755859375, norm_rel=0.024306951090693474, ref_abs_avg=43.16150665283203, test_abs_avg=43.16260528564453
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7682644724845886, max_abs=3.25, mean_rel=0.08055790513753891, max_rel=3.7405643463134766, norm_rel=0.02492334134876728, ref_abs_avg=31.273643493652344, test_abs_avg=31.326396942138672
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0431876182556152, max_abs=7.0, mean_rel=0.17793402075767517, max_rel=1361.681396484375, norm_rel=0.026158710941672325, ref_abs_avg=40.039222717285156, test_abs_avg=40.042030334472656
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9638269543647766, max_abs=6.0, mean_rel=0.26077425479888916, max_rel=2312.5, norm_rel=0.02446429245173931, ref_abs_avg=39.54896545410156, test_abs_avg=39.54888916015625
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7520983219146729, max_abs=3.125, mean_rel=0.1365739107131958, max_rel=29.477663040161133, norm_rel=0.024412594735622406, ref_abs_avg=30.941232681274414, test_abs_avg=30.949037551879883
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9621431827545166, max_abs=7.0, mean_rel=0.18548721075057983, max_rel=1956.2154541015625, norm_rel=0.026034729555249214, ref_abs_avg=37.13690185546875, test_abs_avg=37.13803482055664
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.903484582901001, max_abs=5.75, mean_rel=0.3523174524307251, max_rel=3343.749755859375, norm_rel=0.02464255690574646, ref_abs_avg=36.84397888183594, test_abs_avg=36.85015869140625
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7087873816490173, max_abs=2.5, mean_rel=0.10633908957242966, max_rel=8.394433975219727, norm_rel=0.024795938283205032, ref_abs_avg=28.216318130493164, test_abs_avg=28.174049377441406
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.9032872915267944, max_abs=6.0, mean_rel=0.17085301876068115, max_rel=1355.3194580078125, norm_rel=0.025723615661263466, ref_abs_avg=35.268272399902344, test_abs_avg=35.2706184387207
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8453761339187622, max_abs=5.0, mean_rel=0.25355175137519836, max_rel=2812.499755859375, norm_rel=0.024333935230970383, ref_abs_avg=34.84557342529297, test_abs_avg=34.849037170410156
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6928839683532715, max_abs=2.5, mean_rel=0.25324028730392456, max_rel=47.70888900756836, norm_rel=0.025263171643018723, ref_abs_avg=27.334678649902344, test_abs_avg=27.334470748901367
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8588000535964966, max_abs=6.25, mean_rel=0.17253714799880981, max_rel=2304.456298828125, norm_rel=0.025432882830500603, ref_abs_avg=33.84705352783203, test_abs_avg=33.84995651245117
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.8038008213043213, max_abs=4.875, mean_rel=0.30937623977661133, max_rel=2812.499755859375, norm_rel=0.024237003177404404, ref_abs_avg=33.22760009765625, test_abs_avg=33.228515625
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6121654510498047, max_abs=2.3125, mean_rel=0.12338188290596008, max_rel=11.124873161315918, norm_rel=0.023239590227603912, ref_abs_avg=26.23602867126465, test_abs_avg=26.297700881958008
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8143585324287415, max_abs=5.5, mean_rel=0.16705584526062012, max_rel=1651.79150390625, norm_rel=0.02520676515996456, ref_abs_avg=32.43714141845703, test_abs_avg=32.43903350830078
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7572476863861084, max_abs=4.875, mean_rel=0.27313119173049927, max_rel=2312.5, norm_rel=0.023855065926909447, ref_abs_avg=31.844100952148438, test_abs_avg=31.844078063964844
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5887972116470337, max_abs=2.75, mean_rel=0.1724838763475418, max_rel=58.670894622802734, norm_rel=0.02314825728535652, ref_abs_avg=26.662132263183594, test_abs_avg=26.719039916992188
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7756919264793396, max_abs=5.0, mean_rel=0.15845206379890442, max_rel=1031.0653076171875, norm_rel=0.02497006766498089, ref_abs_avg=31.162572860717773, test_abs_avg=31.162731170654297
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7203229665756226, max_abs=4.75, mean_rel=0.24359425902366638, max_rel=2062.5, norm_rel=0.023421362042427063, ref_abs_avg=30.84880828857422, test_abs_avg=30.85425567626953
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5709130764007568, max_abs=3.25, mean_rel=0.09019419550895691, max_rel=5.766125202178955, norm_rel=0.022294282913208008, ref_abs_avg=25.762104034423828, test_abs_avg=25.79155731201172
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7441962361335754, max_abs=5.5, mean_rel=0.16170740127563477, max_rel=1016.6536865234375, norm_rel=0.024727648124098778, ref_abs_avg=30.173736572265625, test_abs_avg=30.170761108398438
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6911764144897461, max_abs=5.375, mean_rel=0.32765722274780273, max_rel=2312.5, norm_rel=0.02320457063615322, ref_abs_avg=29.87984848022461, test_abs_avg=29.876262664794922
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6255903244018555, max_abs=2.75, mean_rel=0.1302827000617981, max_rel=9.569581985473633, norm_rel=0.023059379309415817, ref_abs_avg=27.133922576904297, test_abs_avg=27.13123321533203
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8291230201721191, max_abs=7.0, mean_rel=0.1716870367527008, max_rel=1066.511962890625, norm_rel=0.026167117059230804, ref_abs_avg=31.811668395996094, test_abs_avg=31.81354331970215
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7728230953216553, max_abs=5.5, mean_rel=0.27455833554267883, max_rel=3312.499755859375, norm_rel=0.024781621992588043, ref_abs_avg=31.342071533203125, test_abs_avg=31.342180252075195
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.583060085773468, max_abs=3.0, mean_rel=0.09532016515731812, max_rel=6.20573616027832, norm_rel=0.02523481659591198, ref_abs_avg=23.65726089477539, test_abs_avg=23.657493591308594
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7632322311401367, max_abs=5.5, mean_rel=0.1712697446346283, max_rel=1622.5323486328125, norm_rel=0.025664813816547394, ref_abs_avg=29.750473022460938, test_abs_avg=29.748937606811523
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7092007994651794, max_abs=4.5, mean_rel=0.2517663538455963, max_rel=2093.75, norm_rel=0.02436254546046257, ref_abs_avg=29.120899200439453, test_abs_avg=29.118112564086914
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5871524810791016, max_abs=2.25, mean_rel=0.10568968951702118, max_rel=7.073315620422363, norm_rel=0.025171032175421715, ref_abs_avg=23.821176528930664, test_abs_avg=23.772001266479492
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7089451551437378, max_abs=5.5, mean_rel=0.17446157336235046, max_rel=2950.78271484375, norm_rel=0.025407619774341583, ref_abs_avg=27.903926849365234, test_abs_avg=27.90525245666504
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6626326441764832, max_abs=4.25, mean_rel=0.27072134613990784, max_rel=2156.25, norm_rel=0.024117687717080116, ref_abs_avg=27.52216911315918, test_abs_avg=27.526630401611328
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5349459648132324, max_abs=2.421875, mean_rel=0.11586054414510727, max_rel=7.821571350097656, norm_rel=0.024279046803712845, ref_abs_avg=22.148502349853516, test_abs_avg=22.145755767822266
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6649413108825684, max_abs=6.0, mean_rel=0.16133281588554382, max_rel=1192.20263671875, norm_rel=0.024765731766819954, ref_abs_avg=26.891775131225586, test_abs_avg=26.89104461669922
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6196117997169495, max_abs=4.25, mean_rel=0.26460450887680054, max_rel=2203.125, norm_rel=0.023462610319256783, ref_abs_avg=26.389955520629883, test_abs_avg=26.384784698486328
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4857292175292969, max_abs=2.0, mean_rel=0.09674059599637985, max_rel=4.252725601196289, norm_rel=0.024646250531077385, ref_abs_avg=20.253917694091797, test_abs_avg=20.26803970336914
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6241961717605591, max_abs=4.5, mean_rel=0.15542061626911163, max_rel=783.7444458007812, norm_rel=0.024653591215610504, ref_abs_avg=25.358612060546875, test_abs_avg=25.357711791992188
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5774258375167847, max_abs=4.0, mean_rel=0.2582584619522095, max_rel=2218.75, norm_rel=0.022818423807621002, ref_abs_avg=25.336362838745117, test_abs_avg=25.33536148071289
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.47606658935546875, max_abs=1.75, mean_rel=0.0743197351694107, max_rel=5.778936386108398, norm_rel=0.02213749848306179, ref_abs_avg=21.249574661254883, test_abs_avg=21.26029396057129
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5903437733650208, max_abs=4.5, mean_rel=0.14571744203567505, max_rel=621.9588623046875, norm_rel=0.02421281859278679, ref_abs_avg=24.401582717895508, test_abs_avg=24.401653289794922
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5497655868530273, max_abs=3.5625, mean_rel=0.24057835340499878, max_rel=1617.1873779296875, norm_rel=0.022818757221102715, ref_abs_avg=24.11384391784668, test_abs_avg=24.114505767822266
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4694570302963257, max_abs=1.75, mean_rel=0.06655263900756836, max_rel=1.6246105432510376, norm_rel=0.024176569655537605, ref_abs_avg=19.6739559173584, test_abs_avg=19.664628982543945
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5628479719161987, max_abs=5.0, mean_rel=0.15189307928085327, max_rel=571.8261108398438, norm_rel=0.02378089912235737, ref_abs_avg=23.709604263305664, test_abs_avg=23.709880828857422
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5201592445373535, max_abs=4.0, mean_rel=0.21480172872543335, max_rel=2375.0, norm_rel=0.022264879196882248, ref_abs_avg=23.381969451904297, test_abs_avg=23.38003921508789
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4229894280433655, max_abs=1.5625, mean_rel=0.24310247600078583, max_rel=57.888648986816406, norm_rel=0.021315842866897583, ref_abs_avg=19.550884246826172, test_abs_avg=19.561046600341797
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5332801342010498, max_abs=4.5, mean_rel=0.14875775575637817, max_rel=625.1808471679688, norm_rel=0.02313324064016342, ref_abs_avg=23.057371139526367, test_abs_avg=23.055421829223633
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.49169909954071045, max_abs=4.5, mean_rel=0.1981804072856903, max_rel=2093.75, norm_rel=0.021707095205783844, ref_abs_avg=22.639179229736328, test_abs_avg=22.6414852142334
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4959273338317871, max_abs=2.0, mean_rel=0.08389729261398315, max_rel=3.1248252391815186, norm_rel=0.024775777012109756, ref_abs_avg=20.083106994628906, test_abs_avg=20.105056762695312
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.6165326833724976, max_abs=5.0, mean_rel=0.1707264482975006, max_rel=1108.7900390625, norm_rel=0.024918336421251297, ref_abs_avg=24.72320556640625, test_abs_avg=24.72325897216797
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5627449154853821, max_abs=4.0, mean_rel=0.220418781042099, max_rel=1515.6248779296875, norm_rel=0.023479998111724854, ref_abs_avg=24.061702728271484, test_abs_avg=24.06047821044922
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.41617000102996826, max_abs=2.0, mean_rel=0.6660150289535522, max_rel=301.3800048828125, norm_rel=0.02191055938601494, ref_abs_avg=19.266620635986328, test_abs_avg=19.29851531982422
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5535213947296143, max_abs=4.5, mean_rel=0.15151962637901306, max_rel=1204.6036376953125, norm_rel=0.024117300286889076, ref_abs_avg=22.986915588378906, test_abs_avg=22.986644744873047
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5078555345535278, max_abs=4.0, mean_rel=0.20979203283786774, max_rel=1406.2498779296875, norm_rel=0.022423626855015755, ref_abs_avg=22.667810440063477, test_abs_avg=22.667932510375977
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4044598937034607, max_abs=1.5, mean_rel=0.09642736613750458, max_rel=4.922628879547119, norm_rel=0.022256435826420784, ref_abs_avg=17.885299682617188, test_abs_avg=17.895307540893555
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5070884227752686, max_abs=4.25, mean_rel=0.1513248085975647, max_rel=906.4051513671875, norm_rel=0.023551521822810173, ref_abs_avg=21.604244232177734, test_abs_avg=21.604143142700195
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.46417587995529175, max_abs=3.75, mean_rel=0.1926262229681015, max_rel=1250.0, norm_rel=0.021570837125182152, ref_abs_avg=21.48431396484375, test_abs_avg=21.47903060913086
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3652040958404541, max_abs=1.375, mean_rel=0.12278510630130768, max_rel=20.16423797607422, norm_rel=0.02151397056877613, ref_abs_avg=16.97597312927246, test_abs_avg=16.98866844177246
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4813191294670105, max_abs=5.0, mean_rel=0.15499809384346008, max_rel=1308.779541015625, norm_rel=0.023122021928429604, ref_abs_avg=20.893415451049805, test_abs_avg=20.893657684326172
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4466049075126648, max_abs=3.375, mean_rel=0.19934892654418945, max_rel=1749.9998779296875, norm_rel=0.02110445126891136, ref_abs_avg=21.198217391967773, test_abs_avg=21.18840789794922
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.349520206451416, max_abs=1.25, mean_rel=0.10468613356351852, max_rel=20.16290283203125, norm_rel=0.021448295563459396, ref_abs_avg=16.78297233581543, test_abs_avg=16.790489196777344
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.44366908073425293, max_abs=6.0, mean_rel=0.13726414740085602, max_rel=884.6073608398438, norm_rel=0.022244658321142197, ref_abs_avg=20.043304443359375, test_abs_avg=20.044052124023438
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4140755236148834, max_abs=3.375, mean_rel=0.18669816851615906, max_rel=1312.4998779296875, norm_rel=0.021227873861789703, ref_abs_avg=19.641517639160156, test_abs_avg=19.63913345336914
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.34436798095703125, max_abs=1.3125, mean_rel=0.08788838982582092, max_rel=3.704432249069214, norm_rel=0.020640432834625244, ref_abs_avg=16.472702026367188, test_abs_avg=16.472427368164062
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4132882058620453, max_abs=4.0, mean_rel=0.1300250142812729, max_rel=632.7434692382812, norm_rel=0.02181769348680973, ref_abs_avg=19.084753036499023, test_abs_avg=19.085159301757812
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3872121572494507, max_abs=5.25, mean_rel=0.18608854711055756, max_rel=1601.5623779296875, norm_rel=0.020853986963629723, ref_abs_avg=18.877159118652344, test_abs_avg=18.88611602783203
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30470776557922363, max_abs=1.125, mean_rel=0.13031086325645447, max_rel=22.219051361083984, norm_rel=0.019280746579170227, ref_abs_avg=15.812870025634766, test_abs_avg=15.824491500854492
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3957885801792145, max_abs=5.0, mean_rel=0.13128575682640076, max_rel=734.3020629882812, norm_rel=0.021493323147296906, ref_abs_avg=18.665260314941406, test_abs_avg=18.66300392150879
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3647005558013916, max_abs=3.5, mean_rel=0.20247645676136017, max_rel=1296.8748779296875, norm_rel=0.019947553053498268, ref_abs_avg=18.628549575805664, test_abs_avg=18.623931884765625
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3010648488998413, max_abs=1.375, mean_rel=0.09122346341609955, max_rel=12.952353477478027, norm_rel=0.019917912781238556, ref_abs_avg=15.149154663085938, test_abs_avg=15.167022705078125
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.37515759468078613, max_abs=4.5, mean_rel=0.127582848072052, max_rel=1195.123779296875, norm_rel=0.02085968665778637, ref_abs_avg=18.29909896850586, test_abs_avg=18.29949188232422
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.33564066886901855, max_abs=3.75, mean_rel=0.17221029102802277, max_rel=2062.5, norm_rel=0.018208129331469536, ref_abs_avg=18.643722534179688, test_abs_avg=18.637842178344727
production_forward2 vs paper_forward output: mean_abs=0.0016669135075062513, max_abs=0.0390625
production_forward2 grad[0] vs paper_forward: mean_abs=0.00853512343019247, max_abs=0.328125, mean_rel=0.07275441288948059, max_rel=118.16667938232422, norm_rel=0.01988513395190239, ref_abs_avg=0.4654313921928406, test_abs_avg=0.46544694900512695
production_forward2 grad[1] vs paper_forward: mean_abs=7.315980911254883, max_abs=56.0, mean_rel=0.19792236387729645, max_rel=470.15850830078125, norm_rel=0.02064819447696209, ref_abs_avg=319.9287414550781, test_abs_avg=319.94500732421875
production_forward2 grad[2] vs paper_forward: mean_abs=1.1715831756591797, max_abs=5.0, mean_rel=0.08071266114711761, max_rel=5.195686340332031, norm_rel=0.020948385819792747, ref_abs_avg=56.64007568359375, test_abs_avg=56.559329986572266
production_forward2 grad[3] vs paper_forward: mean_abs=1.6324841976165771, max_abs=13.0, mean_rel=0.17362448573112488, max_rel=1606.0982666015625, norm_rel=0.024447740986943245, ref_abs_avg=67.20026397705078, test_abs_avg=67.20258331298828
production_forward2 grad[4] vs paper_forward: mean_abs=1.5074924230575562, max_abs=10.0, mean_rel=0.42471200227737427, max_rel=3687.499755859375, norm_rel=0.022741813212633133, ref_abs_avg=66.6716537475586, test_abs_avg=66.67332458496094
production_forward2 grad[5] vs paper_forward: mean_abs=1.1614618301391602, max_abs=5.0, mean_rel=0.15097510814666748, max_rel=16.884065628051758, norm_rel=0.02377619594335556, ref_abs_avg=49.247413635253906, test_abs_avg=49.222599029541016
production_forward2 grad[6] vs paper_forward: mean_abs=1.442260503768921, max_abs=10.0, mean_rel=0.16211509704589844, max_rel=2123.394775390625, norm_rel=0.02431434951722622, ref_abs_avg=59.72316360473633, test_abs_avg=59.72681427001953
production_forward2 grad[7] vs paper_forward: mean_abs=1.3231115341186523, max_abs=8.5, mean_rel=0.33754968643188477, max_rel=3843.749755859375, norm_rel=0.02256743796169758, ref_abs_avg=58.916107177734375, test_abs_avg=58.92334747314453
production_forward2 grad[8] vs paper_forward: mean_abs=1.0386962890625, max_abs=3.5, mean_rel=0.07552521675825119, max_rel=3.4408793449401855, norm_rel=0.023465298116207123, ref_abs_avg=43.57536315917969, test_abs_avg=43.670799255371094
production_forward2 grad[9] vs paper_forward: mean_abs=1.3014707565307617, max_abs=9.0, mean_rel=0.15843695402145386, max_rel=2493.9541015625, norm_rel=0.024102626368403435, ref_abs_avg=54.33781051635742, test_abs_avg=54.34197235107422
production_forward2 grad[10] vs paper_forward: mean_abs=1.20188570022583, max_abs=7.1875, mean_rel=0.3542039394378662, max_rel=3593.749755859375, norm_rel=0.022479576990008354, ref_abs_avg=53.723018646240234, test_abs_avg=53.72393798828125
production_forward2 grad[11] vs paper_forward: mean_abs=0.9284853935241699, max_abs=3.25, mean_rel=0.0771176815032959, max_rel=4.237642765045166, norm_rel=0.02153155393898487, ref_abs_avg=43.714351654052734, test_abs_avg=43.67515563964844
production_forward2 grad[12] vs paper_forward: mean_abs=1.2131601572036743, max_abs=9.0, mean_rel=0.1635374128818512, max_rel=1428.9847412109375, norm_rel=0.023976240307092667, ref_abs_avg=50.97454071044922, test_abs_avg=50.975685119628906
production_forward2 grad[13] vs paper_forward: mean_abs=1.121347427368164, max_abs=7.75, mean_rel=0.3349427878856659, max_rel=3703.124755859375, norm_rel=0.02226077765226364, ref_abs_avg=50.59917449951172, test_abs_avg=50.60137939453125
production_forward2 grad[14] vs paper_forward: mean_abs=0.8880128860473633, max_abs=3.875, mean_rel=0.07292685657739639, max_rel=8.271524429321289, norm_rel=0.023034842684864998, ref_abs_avg=40.53213882446289, test_abs_avg=40.532196044921875
production_forward2 grad[15] vs paper_forward: mean_abs=1.1339435577392578, max_abs=7.5, mean_rel=0.16717983782291412, max_rel=1369.847412109375, norm_rel=0.023843467235565186, ref_abs_avg=47.8802375793457, test_abs_avg=47.88286590576172
production_forward2 grad[16] vs paper_forward: mean_abs=1.0360124111175537, max_abs=6.0, mean_rel=0.315437376499176, max_rel=3593.749755859375, norm_rel=0.022096335887908936, ref_abs_avg=47.19331359863281, test_abs_avg=47.19902420043945
production_forward2 grad[17] vs paper_forward: mean_abs=0.7929668426513672, max_abs=3.375, mean_rel=0.11131909489631653, max_rel=27.469161987304688, norm_rel=0.022447485476732254, ref_abs_avg=36.763729095458984, test_abs_avg=36.786930084228516
production_forward2 grad[18] vs paper_forward: mean_abs=1.0698645114898682, max_abs=7.0, mean_rel=0.15809427201747894, max_rel=1738.0596923828125, norm_rel=0.02353610470890999, ref_abs_avg=45.72034454345703, test_abs_avg=45.7225227355957
production_forward2 grad[19] vs paper_forward: mean_abs=0.9845353364944458, max_abs=6.0, mean_rel=0.2778342068195343, max_rel=3624.999755859375, norm_rel=0.02219739556312561, ref_abs_avg=44.49462127685547, test_abs_avg=44.49794006347656
production_forward2 grad[20] vs paper_forward: mean_abs=0.7826030254364014, max_abs=3.25, mean_rel=0.17685049772262573, max_rel=39.63286590576172, norm_rel=0.023162100464105606, ref_abs_avg=34.03915023803711, test_abs_avg=33.97290802001953
production_forward2 grad[21] vs paper_forward: mean_abs=1.0076411962509155, max_abs=7.0, mean_rel=0.1648237258195877, max_rel=2103.639404296875, norm_rel=0.0235383752733469, ref_abs_avg=43.072933197021484, test_abs_avg=43.07524108886719
production_forward2 grad[22] vs paper_forward: mean_abs=0.9284993410110474, max_abs=5.53125, mean_rel=0.2345193475484848, max_rel=2593.749755859375, norm_rel=0.021728098392486572, ref_abs_avg=42.83879852294922, test_abs_avg=42.840362548828125
production_forward2 grad[23] vs paper_forward: mean_abs=0.7548971176147461, max_abs=3.25, mean_rel=0.17094337940216064, max_rel=28.72235679626465, norm_rel=0.023250292986631393, ref_abs_avg=33.102813720703125, test_abs_avg=33.055137634277344
production_forward2 grad[24] vs paper_forward: mean_abs=0.9567643404006958, max_abs=6.0, mean_rel=0.15809938311576843, max_rel=1334.01318359375, norm_rel=0.023312877863645554, ref_abs_avg=41.27278137207031, test_abs_avg=41.275230407714844
production_forward2 grad[25] vs paper_forward: mean_abs=0.8805087804794312, max_abs=5.5, mean_rel=0.3187519311904907, max_rel=3374.999755859375, norm_rel=0.02193853072822094, ref_abs_avg=40.23228454589844, test_abs_avg=40.2316780090332
production_forward2 grad[26] vs paper_forward: mean_abs=0.8330936431884766, max_abs=3.4375, mean_rel=0.07897202670574188, max_rel=2.9664340019226074, norm_rel=0.023137139156460762, ref_abs_avg=35.357765197753906, test_abs_avg=35.42512512207031
production_forward2 grad[27] vs paper_forward: mean_abs=1.098006010055542, max_abs=8.0, mean_rel=0.17509377002716064, max_rel=1946.816650390625, norm_rel=0.02530018426477909, ref_abs_avg=43.664093017578125, test_abs_avg=43.66883850097656
production_forward2 grad[28] vs paper_forward: mean_abs=1.0182713270187378, max_abs=7.125, mean_rel=0.4024128317832947, max_rel=3874.999755859375, norm_rel=0.02371750958263874, ref_abs_avg=43.16150665283203, test_abs_avg=43.16443634033203
production_forward2 grad[29] vs paper_forward: mean_abs=0.738837718963623, max_abs=3.5078125, mean_rel=0.09170801937580109, max_rel=9.660316467285156, norm_rel=0.023675519973039627, ref_abs_avg=31.273643493652344, test_abs_avg=31.318744659423828
production_forward2 grad[30] vs paper_forward: mean_abs=1.0211352109909058, max_abs=6.5, mean_rel=0.17309196293354034, max_rel=1249.671875, norm_rel=0.025625333189964294, ref_abs_avg=40.039222717285156, test_abs_avg=40.040626525878906
production_forward2 grad[31] vs paper_forward: mean_abs=0.9413976669311523, max_abs=5.0, mean_rel=0.2741590440273285, max_rel=2624.999755859375, norm_rel=0.02387413941323757, ref_abs_avg=39.54896545410156, test_abs_avg=39.552528381347656
production_forward2 grad[32] vs paper_forward: mean_abs=0.740424394607544, max_abs=3.25, mean_rel=0.12301896512508392, max_rel=29.477663040161133, norm_rel=0.023914724588394165, ref_abs_avg=30.941232681274414, test_abs_avg=30.906097412109375
production_forward2 grad[33] vs paper_forward: mean_abs=0.9423919916152954, max_abs=6.0, mean_rel=0.18207649886608124, max_rel=2124.173095703125, norm_rel=0.02550695277750492, ref_abs_avg=37.13690185546875, test_abs_avg=37.13824462890625
production_forward2 grad[34] vs paper_forward: mean_abs=0.886747419834137, max_abs=5.625, mean_rel=0.32521411776542664, max_rel=3406.249755859375, norm_rel=0.024171750992536545, ref_abs_avg=36.84397888183594, test_abs_avg=36.849884033203125
production_forward2 grad[35] vs paper_forward: mean_abs=0.6631375551223755, max_abs=2.5, mean_rel=0.10931558907032013, max_rel=9.671998977661133, norm_rel=0.023800622671842575, ref_abs_avg=28.216318130493164, test_abs_avg=28.211002349853516
production_forward2 grad[36] vs paper_forward: mean_abs=0.886602520942688, max_abs=6.5, mean_rel=0.16955220699310303, max_rel=1245.3951416015625, norm_rel=0.0252460278570652, ref_abs_avg=35.268272399902344, test_abs_avg=35.269203186035156
production_forward2 grad[37] vs paper_forward: mean_abs=0.8275812268257141, max_abs=5.0625, mean_rel=0.26456719636917114, max_rel=2874.999755859375, norm_rel=0.02385522797703743, ref_abs_avg=34.84557342529297, test_abs_avg=34.847747802734375
production_forward2 grad[38] vs paper_forward: mean_abs=0.673591136932373, max_abs=2.5, mean_rel=0.1846342533826828, max_rel=25.61161994934082, norm_rel=0.025080420076847076, ref_abs_avg=27.334678649902344, test_abs_avg=27.319164276123047
production_forward2 grad[39] vs paper_forward: mean_abs=0.8427702784538269, max_abs=6.0, mean_rel=0.1679590344429016, max_rel=1930.55810546875, norm_rel=0.024988938122987747, ref_abs_avg=33.84705352783203, test_abs_avg=33.849822998046875
production_forward2 grad[40] vs paper_forward: mean_abs=0.7868428230285645, max_abs=5.0, mean_rel=0.3094635605812073, max_rel=2796.874755859375, norm_rel=0.023739395663142204, ref_abs_avg=33.22760009765625, test_abs_avg=33.230323791503906
production_forward2 grad[41] vs paper_forward: mean_abs=0.6226787567138672, max_abs=2.25, mean_rel=0.1154274046421051, max_rel=10.110129356384277, norm_rel=0.02412497065961361, ref_abs_avg=26.23602867126465, test_abs_avg=26.285572052001953
production_forward2 grad[42] vs paper_forward: mean_abs=0.8004120588302612, max_abs=5.25, mean_rel=0.16554437577724457, max_rel=1201.5755615234375, norm_rel=0.024785107001662254, ref_abs_avg=32.43714141845703, test_abs_avg=32.43812561035156
production_forward2 grad[43] vs paper_forward: mean_abs=0.7429571151733398, max_abs=4.53125, mean_rel=0.2636144161224365, max_rel=2531.25, norm_rel=0.02339812181890011, ref_abs_avg=31.844100952148438, test_abs_avg=31.846445083618164
production_forward2 grad[44] vs paper_forward: mean_abs=0.598213791847229, max_abs=2.875, mean_rel=0.13845762610435486, max_rel=35.15056610107422, norm_rel=0.023002883419394493, ref_abs_avg=26.662132263183594, test_abs_avg=26.703306198120117
production_forward2 grad[45] vs paper_forward: mean_abs=0.7643897533416748, max_abs=5.0, mean_rel=0.15625981986522675, max_rel=1271.6075439453125, norm_rel=0.024620916694402695, ref_abs_avg=31.162572860717773, test_abs_avg=31.162946701049805
production_forward2 grad[46] vs paper_forward: mean_abs=0.7095301747322083, max_abs=4.21875, mean_rel=0.24479907751083374, max_rel=2687.499755859375, norm_rel=0.02307509258389473, ref_abs_avg=30.84880828857422, test_abs_avg=30.85052490234375
production_forward2 grad[47] vs paper_forward: mean_abs=0.5662860870361328, max_abs=3.0, mean_rel=0.08248640596866608, max_rel=5.009084701538086, norm_rel=0.022111456841230392, ref_abs_avg=25.762104034423828, test_abs_avg=25.79852867126465
production_forward2 grad[48] vs paper_forward: mean_abs=0.7341008186340332, max_abs=5.0, mean_rel=0.16026364266872406, max_rel=1039.0343017578125, norm_rel=0.024399425834417343, ref_abs_avg=30.173736572265625, test_abs_avg=30.171737670898438
production_forward2 grad[49] vs paper_forward: mean_abs=0.6806095838546753, max_abs=4.75, mean_rel=0.31969231367111206, max_rel=3093.749755859375, norm_rel=0.022857099771499634, ref_abs_avg=29.87984848022461, test_abs_avg=29.877487182617188
production_forward2 grad[50] vs paper_forward: mean_abs=0.6385307312011719, max_abs=2.1328125, mean_rel=0.1606094390153885, max_rel=25.97101402282715, norm_rel=0.023488204926252365, ref_abs_avg=27.133922576904297, test_abs_avg=27.13699722290039
production_forward2 grad[51] vs paper_forward: mean_abs=0.8182997703552246, max_abs=6.0, mean_rel=0.17220525443553925, max_rel=1533.6275634765625, norm_rel=0.025812357664108276, ref_abs_avg=31.811668395996094, test_abs_avg=31.813085556030273
production_forward2 grad[52] vs paper_forward: mean_abs=0.7617750763893127, max_abs=5.625, mean_rel=0.2795313000679016, max_rel=2937.499755859375, norm_rel=0.024430498480796814, ref_abs_avg=31.342071533203125, test_abs_avg=31.341873168945312
production_forward2 grad[53] vs paper_forward: mean_abs=0.5827399492263794, max_abs=2.375, mean_rel=0.11288706213235855, max_rel=8.397071838378906, norm_rel=0.025192182511091232, ref_abs_avg=23.65726089477539, test_abs_avg=23.656476974487305
production_forward2 grad[54] vs paper_forward: mean_abs=0.7519451379776001, max_abs=6.0, mean_rel=0.17227137088775635, max_rel=1684.6673583984375, norm_rel=0.02529107592999935, ref_abs_avg=29.750473022460938, test_abs_avg=29.749767303466797
production_forward2 grad[55] vs paper_forward: mean_abs=0.6982259750366211, max_abs=4.4375, mean_rel=0.23206178843975067, max_rel=2062.5, norm_rel=0.023992808535695076, ref_abs_avg=29.120899200439453, test_abs_avg=29.11977767944336
production_forward2 grad[56] vs paper_forward: mean_abs=0.5586104393005371, max_abs=2.4375, mean_rel=0.10561245679855347, max_rel=4.218667507171631, norm_rel=0.02421225607395172, ref_abs_avg=23.821176528930664, test_abs_avg=23.813905715942383
production_forward2 grad[57] vs paper_forward: mean_abs=0.698794960975647, max_abs=6.0, mean_rel=0.16730600595474243, max_rel=2884.3203125, norm_rel=0.02506374940276146, ref_abs_avg=27.903926849365234, test_abs_avg=27.905742645263672
production_forward2 grad[58] vs paper_forward: mean_abs=0.6500065326690674, max_abs=4.0, mean_rel=0.2711784839630127, max_rel=2687.499755859375, norm_rel=0.02364262193441391, ref_abs_avg=27.52216911315918, test_abs_avg=27.525083541870117
production_forward2 grad[59] vs paper_forward: mean_abs=0.5336647033691406, max_abs=2.015625, mean_rel=0.11636392772197723, max_rel=6.058358192443848, norm_rel=0.024149205535650253, ref_abs_avg=22.148502349853516, test_abs_avg=22.13154411315918
production_forward2 grad[60] vs paper_forward: mean_abs=0.657214879989624, max_abs=5.0, mean_rel=0.15845826268196106, max_rel=1334.6622314453125, norm_rel=0.024485813453793526, ref_abs_avg=26.891775131225586, test_abs_avg=26.88990020751953
production_forward2 grad[61] vs paper_forward: mean_abs=0.6098542809486389, max_abs=4.5, mean_rel=0.26304128766059875, max_rel=1828.1248779296875, norm_rel=0.023130223155021667, ref_abs_avg=26.389955520629883, test_abs_avg=26.384958267211914
production_forward2 grad[62] vs paper_forward: mean_abs=0.45924806594848633, max_abs=2.25, mean_rel=0.08859384804964066, max_rel=3.4397799968719482, norm_rel=0.02349761128425598, ref_abs_avg=20.253917694091797, test_abs_avg=20.26657485961914
production_forward2 grad[63] vs paper_forward: mean_abs=0.6153310537338257, max_abs=5.0, mean_rel=0.15363618731498718, max_rel=708.8580932617188, norm_rel=0.024316363036632538, ref_abs_avg=25.358612060546875, test_abs_avg=25.357179641723633
production_forward2 grad[64] vs paper_forward: mean_abs=0.5721249580383301, max_abs=4.0, mean_rel=0.2439163774251938, max_rel=2062.5, norm_rel=0.02262524887919426, ref_abs_avg=25.336362838745117, test_abs_avg=25.33876609802246
production_forward2 grad[65] vs paper_forward: mean_abs=0.46156787872314453, max_abs=1.5625, mean_rel=0.08111472427845001, max_rel=8.545135498046875, norm_rel=0.021952513605356216, ref_abs_avg=21.249574661254883, test_abs_avg=21.247821807861328
production_forward2 grad[66] vs paper_forward: mean_abs=0.5839056968688965, max_abs=4.5, mean_rel=0.14733052253723145, max_rel=1171.9715576171875, norm_rel=0.02395525574684143, ref_abs_avg=24.401582717895508, test_abs_avg=24.40066146850586
production_forward2 grad[67] vs paper_forward: mean_abs=0.5455465316772461, max_abs=3.75, mean_rel=0.2455475628376007, max_rel=1812.4998779296875, norm_rel=0.02264793962240219, ref_abs_avg=24.11384391784668, test_abs_avg=24.113203048706055
production_forward2 grad[68] vs paper_forward: mean_abs=0.45877838134765625, max_abs=1.8125, mean_rel=0.06949815899133682, max_rel=3.51212477684021, norm_rel=0.02411634847521782, ref_abs_avg=19.6739559173584, test_abs_avg=19.66868782043457
production_forward2 grad[69] vs paper_forward: mean_abs=0.5579206943511963, max_abs=4.5, mean_rel=0.15462900698184967, max_rel=801.5562133789062, norm_rel=0.023589637130498886, ref_abs_avg=23.709604263305664, test_abs_avg=23.708995819091797
production_forward2 grad[70] vs paper_forward: mean_abs=0.5168499946594238, max_abs=3.5, mean_rel=0.22238695621490479, max_rel=2125.0, norm_rel=0.022142969071865082, ref_abs_avg=23.381969451904297, test_abs_avg=23.38032341003418
production_forward2 grad[71] vs paper_forward: mean_abs=0.41270875930786133, max_abs=1.6875, mean_rel=0.17017275094985962, max_rel=27.480073928833008, norm_rel=0.02108556032180786, ref_abs_avg=19.550884246826172, test_abs_avg=19.561031341552734
production_forward2 grad[72] vs paper_forward: mean_abs=0.5290893912315369, max_abs=5.0, mean_rel=0.14688462018966675, max_rel=490.26202392578125, norm_rel=0.02294944040477276, ref_abs_avg=23.057371139526367, test_abs_avg=23.054718017578125
production_forward2 grad[73] vs paper_forward: mean_abs=0.4886339604854584, max_abs=3.5, mean_rel=0.19448447227478027, max_rel=2093.75, norm_rel=0.021586263552308083, ref_abs_avg=22.639179229736328, test_abs_avg=22.64300537109375
production_forward2 grad[74] vs paper_forward: mean_abs=0.47684288024902344, max_abs=2.625, mean_rel=0.07732285559177399, max_rel=2.69508957862854, norm_rel=0.02414419874548912, ref_abs_avg=20.083106994628906, test_abs_avg=20.113422393798828
production_forward2 grad[75] vs paper_forward: mean_abs=0.6081445217132568, max_abs=5.0, mean_rel=0.1702006459236145, max_rel=1224.844482421875, norm_rel=0.024621443822979927, ref_abs_avg=24.72320556640625, test_abs_avg=24.723129272460938
production_forward2 grad[76] vs paper_forward: mean_abs=0.5579258799552917, max_abs=4.0, mean_rel=0.2175028920173645, max_rel=1390.6248779296875, norm_rel=0.023277467116713524, ref_abs_avg=24.061702728271484, test_abs_avg=24.058849334716797
production_forward2 grad[77] vs paper_forward: mean_abs=0.4190358519554138, max_abs=1.625, mean_rel=0.33938583731651306, max_rel=127.90215301513672, norm_rel=0.02227860502898693, ref_abs_avg=19.266620635986328, test_abs_avg=19.290271759033203
production_forward2 grad[78] vs paper_forward: mean_abs=0.5479365587234497, max_abs=4.5, mean_rel=0.1488441824913025, max_rel=769.5045776367188, norm_rel=0.023884007707238197, ref_abs_avg=22.986915588378906, test_abs_avg=22.98688316345215
production_forward2 grad[79] vs paper_forward: mean_abs=0.503269374370575, max_abs=4.0, mean_rel=0.21087050437927246, max_rel=1484.3748779296875, norm_rel=0.02225486934185028, ref_abs_avg=22.667810440063477, test_abs_avg=22.66936683654785
production_forward2 grad[80] vs paper_forward: mean_abs=0.3924596309661865, max_abs=1.375, mean_rel=0.11573285609483719, max_rel=11.142383575439453, norm_rel=0.022048303857445717, ref_abs_avg=17.885299682617188, test_abs_avg=17.8787784576416
production_forward2 grad[81] vs paper_forward: mean_abs=0.5013071298599243, max_abs=4.125, mean_rel=0.1474357545375824, max_rel=826.9426879882812, norm_rel=0.023280968889594078, ref_abs_avg=21.604244232177734, test_abs_avg=21.60378074645996
production_forward2 grad[82] vs paper_forward: mean_abs=0.4570852816104889, max_abs=3.75, mean_rel=0.18839812278747559, max_rel=968.7499389648438, norm_rel=0.021249547600746155, ref_abs_avg=21.48431396484375, test_abs_avg=21.47393035888672
production_forward2 grad[83] vs paper_forward: mean_abs=0.3766903877258301, max_abs=1.5, mean_rel=0.09957560151815414, max_rel=9.510940551757812, norm_rel=0.02190745621919632, ref_abs_avg=16.97597312927246, test_abs_avg=16.98925018310547
production_forward2 grad[84] vs paper_forward: mean_abs=0.4763204753398895, max_abs=4.5, mean_rel=0.15311022102832794, max_rel=1378.76220703125, norm_rel=0.022889593616127968, ref_abs_avg=20.893415451049805, test_abs_avg=20.89423179626465
production_forward2 grad[85] vs paper_forward: mean_abs=0.4397047460079193, max_abs=3.5, mean_rel=0.21050141751766205, max_rel=1718.7498779296875, norm_rel=0.020761750638484955, ref_abs_avg=21.198217391967773, test_abs_avg=21.187793731689453
production_forward2 grad[86] vs paper_forward: mean_abs=0.34090256690979004, max_abs=1.5, mean_rel=0.10102387517690659, max_rel=16.06536102294922, norm_rel=0.021383877843618393, ref_abs_avg=16.78297233581543, test_abs_avg=16.778974533081055
production_forward2 grad[87] vs paper_forward: mean_abs=0.44209688901901245, max_abs=6.5, mean_rel=0.13631457090377808, max_rel=685.87744140625, norm_rel=0.022185523062944412, ref_abs_avg=20.043304443359375, test_abs_avg=20.04409408569336
production_forward2 grad[88] vs paper_forward: mean_abs=0.40801945328712463, max_abs=3.5, mean_rel=0.18889479339122772, max_rel=1499.9998779296875, norm_rel=0.02091636322438717, ref_abs_avg=19.641517639160156, test_abs_avg=19.64373016357422
production_forward2 grad[89] vs paper_forward: mean_abs=0.3496055603027344, max_abs=1.375, mean_rel=0.08701904118061066, max_rel=4.010403156280518, norm_rel=0.020604044198989868, ref_abs_avg=16.472702026367188, test_abs_avg=16.474864959716797
production_forward2 grad[90] vs paper_forward: mean_abs=0.41104522347450256, max_abs=4.0625, mean_rel=0.1306266188621521, max_rel=819.8185424804688, norm_rel=0.021711235865950584, ref_abs_avg=19.084753036499023, test_abs_avg=19.085369110107422
production_forward2 grad[91] vs paper_forward: mean_abs=0.38505351543426514, max_abs=4.5, mean_rel=0.18138548731803894, max_rel=1749.9998779296875, norm_rel=0.020748905837535858, ref_abs_avg=18.877159118652344, test_abs_avg=18.88994598388672
production_forward2 grad[92] vs paper_forward: mean_abs=0.3114755153656006, max_abs=1.125, mean_rel=0.11099041253328323, max_rel=12.516861915588379, norm_rel=0.01981445960700512, ref_abs_avg=15.812870025634766, test_abs_avg=15.831107139587402
production_forward2 grad[93] vs paper_forward: mean_abs=0.3942340016365051, max_abs=4.5, mean_rel=0.13026180863380432, max_rel=750.9842529296875, norm_rel=0.021409302949905396, ref_abs_avg=18.665260314941406, test_abs_avg=18.663063049316406
production_forward2 grad[94] vs paper_forward: mean_abs=0.3655051589012146, max_abs=3.6875, mean_rel=0.195242241024971, max_rel=1562.4998779296875, norm_rel=0.020016072317957878, ref_abs_avg=18.628549575805664, test_abs_avg=18.621679306030273
production_forward2 grad[95] vs paper_forward: mean_abs=0.2943287491798401, max_abs=1.25, mean_rel=0.08803673088550568, max_rel=11.550985336303711, norm_rel=0.019332652911543846, ref_abs_avg=15.149154663085938, test_abs_avg=15.16796875
production_forward2 grad[96] vs paper_forward: mean_abs=0.37332388758659363, max_abs=5.0, mean_rel=0.12686103582382202, max_rel=1096.9521484375, norm_rel=0.02077450044453144, ref_abs_avg=18.29909896850586, test_abs_avg=18.29920196533203
production_forward2 grad[97] vs paper_forward: mean_abs=0.3378567695617676, max_abs=3.5, mean_rel=0.17717233300209045, max_rel=1999.9998779296875, norm_rel=0.018265342339873314, ref_abs_avg=18.643722534179688, test_abs_avg=18.63864517211914

