identity layers + randn queries
paper_forward fwd+bwd:  382.317 ms
paper_forward bwd-only: 302.164 ms
paper_forward peak allocated: fwd=29.697 GiB, fwd+bwd=31.815 GiB
paper_forward peak reserved:  fwd=29.721 GiB, fwd+bwd=32.471 GiB
torch_compile_phases_forward fwd+bwd:  165.946 ms
torch_compile_phases_forward bwd-only: 132.703 ms
torch_compile_phases_forward peak allocated: fwd=12.781 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.078 GiB, fwd+bwd=17.330 GiB
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.69s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_out_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_out_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16'),
finished after 2.48s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.75s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.86s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.88s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32'),
finished after 2.61s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (5, 512, 1, 8, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 3.16s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 1, 'torch.float32', 'torch.float32'),
finished after 0.96s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_2_online_softmax_merge_intrablock_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_2_online_softmax_merge_intrablock_backward_kernel,
with key as (512, 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 2.70s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_2_reduce_grad_pseudo_query_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_2_reduce_grad_pseudo_query_kernel,
with key as (131072, 512, 'torch.float32', 'torch.float32'),
finished after 1.17s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (4, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 3.90s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 32, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_reduce_grad_pseudo_queries_kernel with config BLOCK_BATCH_SEQ: 256, BLOCK_HIDDEN: 64, num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Triton autotuning for function phase_1_reduce_grad_pseudo_queries_kernel,
with key as (131072, 512, 8, 'torch.float32', 'torch.float32'),
finished after 1.06s,
best config selected: BLOCK_BATCH_SEQ: 128, BLOCK_HIDDEN: 64, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (3, 512, 8, 4, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 3.78s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (2, 512, 8, 2, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 3.31s,
best config selected: num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None;
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 1, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 2, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 4, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 8, num_ctas: 1, num_stages: 4, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 1, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 2, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 3, maxnreg: None
Autotuning kernel phase_1_batched_interblock_attention_backward_kernel with config num_warps: 16, num_ctas: 1, num_stages: 4, maxnreg: None
Triton autotuning for function phase_1_batched_interblock_attention_backward_kernel,
with key as (1, 512, 8, 1, 'torch.bfloat16', 'torch.bfloat16', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32', 'torch.float32'),
finished after 2.99s,
best config selected: num_warps: 2, num_ctas: 1, num_stages: 2, maxnreg: None;
production_forward2 fwd+bwd:  113.577 ms
production_forward2 bwd-only: 95.995 ms
production_forward2 peak allocated: fwd=3.071 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=3.299 GiB, fwd+bwd=11.299 GiB
production_forward fwd+bwd:  113.597 ms
production_forward bwd-only: 96.031 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.299 GiB, fwd+bwd=10.299 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016483073122799397, max_abs=0.0546875
production_forward grad[0] vs paper_forward: mean_abs=0.00850963406264782, max_abs=0.4375, mean_rel=0.072794109582901, max_rel=80.74031066894531, norm_rel=0.019973477348685265, ref_abs_avg=0.4635418653488159, test_abs_avg=0.4635607600212097
production_forward grad[1] vs paper_forward: mean_abs=7.392704963684082, max_abs=64.0, mean_rel=0.14511463046073914, max_rel=365.0574035644531, norm_rel=0.020964212715625763, ref_abs_avg=320.3311767578125, test_abs_avg=320.4688720703125
production_forward grad[2] vs paper_forward: mean_abs=1.2929344177246094, max_abs=5.0, mean_rel=0.08883114904165268, max_rel=5.770637035369873, norm_rel=0.022183774039149284, ref_abs_avg=57.24327087402344, test_abs_avg=57.1498908996582
production_forward grad[3] vs paper_forward: mean_abs=1.6490485668182373, max_abs=13.0, mean_rel=0.16746053099632263, max_rel=1393.175048828125, norm_rel=0.024506786838173866, ref_abs_avg=67.73782348632812, test_abs_avg=67.74073791503906
production_forward grad[4] vs paper_forward: mean_abs=1.5321176052093506, max_abs=9.875, mean_rel=0.4274808168411255, max_rel=4687.5, norm_rel=0.022858506068587303, ref_abs_avg=67.30644226074219, test_abs_avg=67.31477355957031
production_forward grad[5] vs paper_forward: mean_abs=1.0545845031738281, max_abs=4.25, mean_rel=0.10172292590141296, max_rel=4.768734931945801, norm_rel=0.023423384875059128, ref_abs_avg=46.773712158203125, test_abs_avg=46.75625991821289
production_forward grad[6] vs paper_forward: mean_abs=1.4387842416763306, max_abs=10.0, mean_rel=0.15694186091423035, max_rel=885.1193237304688, norm_rel=0.024245038628578186, ref_abs_avg=59.732994079589844, test_abs_avg=59.73745346069336
production_forward grad[7] vs paper_forward: mean_abs=1.3287022113800049, max_abs=8.5, mean_rel=0.3734366297721863, max_rel=3874.999755859375, norm_rel=0.022762654349207878, ref_abs_avg=58.63291931152344, test_abs_avg=58.629119873046875
production_forward grad[8] vs paper_forward: mean_abs=1.0186958312988281, max_abs=4.296875, mean_rel=0.09944825619459152, max_rel=4.340941429138184, norm_rel=0.022632041946053505, ref_abs_avg=44.95298767089844, test_abs_avg=44.994625091552734
production_forward grad[9] vs paper_forward: mean_abs=1.2991154193878174, max_abs=9.0, mean_rel=0.1545378565788269, max_rel=1564.8175048828125, norm_rel=0.024090571328997612, ref_abs_avg=54.25998306274414, test_abs_avg=54.260677337646484
production_forward grad[10] vs paper_forward: mean_abs=1.198035717010498, max_abs=7.5, mean_rel=0.4801519215106964, max_rel=4500.0, norm_rel=0.022616803646087646, ref_abs_avg=53.31297302246094, test_abs_avg=53.320098876953125
production_forward grad[11] vs paper_forward: mean_abs=0.9676964282989502, max_abs=3.25, mean_rel=0.1421402096748352, max_rel=12.426764488220215, norm_rel=0.022503886371850967, ref_abs_avg=43.007537841796875, test_abs_avg=43.063899993896484
production_forward grad[12] vs paper_forward: mean_abs=1.1973934173583984, max_abs=8.5, mean_rel=0.15239213407039642, max_rel=879.1559448242188, norm_rel=0.02390582673251629, ref_abs_avg=50.392417907714844, test_abs_avg=50.3941650390625
production_forward grad[13] vs paper_forward: mean_abs=1.1082522869110107, max_abs=8.0, mean_rel=0.37468230724334717, max_rel=3624.999755859375, norm_rel=0.022373458370566368, ref_abs_avg=49.786075592041016, test_abs_avg=49.78630828857422
production_forward grad[14] vs paper_forward: mean_abs=0.8641929626464844, max_abs=3.0, mean_rel=0.07823817431926727, max_rel=3.7232894897460938, norm_rel=0.02316437102854252, ref_abs_avg=36.49650573730469, test_abs_avg=36.54313278198242
production_forward grad[15] vs paper_forward: mean_abs=1.1164917945861816, max_abs=7.5, mean_rel=0.16688594222068787, max_rel=2114.716064453125, norm_rel=0.023884380236268044, ref_abs_avg=47.01763916015625, test_abs_avg=47.01746368408203
production_forward grad[16] vs paper_forward: mean_abs=1.024684190750122, max_abs=6.0, mean_rel=0.29424235224723816, max_rel=2734.374755859375, norm_rel=0.022206759080290794, ref_abs_avg=46.422996520996094, test_abs_avg=46.42253112792969
production_forward grad[17] vs paper_forward: mean_abs=0.8142795562744141, max_abs=3.25, mean_rel=0.09660416841506958, max_rel=6.65287971496582, norm_rel=0.022484101355075836, ref_abs_avg=37.062957763671875, test_abs_avg=37.106990814208984
production_forward grad[18] vs paper_forward: mean_abs=1.0596075057983398, max_abs=7.0, mean_rel=0.1608903706073761, max_rel=1260.625244140625, norm_rel=0.02364419959485531, ref_abs_avg=45.105133056640625, test_abs_avg=45.11012268066406
production_forward grad[19] vs paper_forward: mean_abs=0.9710997343063354, max_abs=6.0625, mean_rel=0.3649677336215973, max_rel=3640.624755859375, norm_rel=0.022095633670687675, ref_abs_avg=44.156517028808594, test_abs_avg=44.16495895385742
production_forward grad[20] vs paper_forward: mean_abs=0.8150510787963867, max_abs=3.0, mean_rel=0.1291193813085556, max_rel=13.586956024169922, norm_rel=0.02431187964975834, ref_abs_avg=32.83831024169922, test_abs_avg=32.82366180419922
production_forward grad[21] vs paper_forward: mean_abs=1.002755045890808, max_abs=7.5, mean_rel=0.14997053146362305, max_rel=1264.386474609375, norm_rel=0.023576129227876663, ref_abs_avg=42.76148223876953, test_abs_avg=42.765480041503906
production_forward grad[22] vs paper_forward: mean_abs=0.919110894203186, max_abs=6.5, mean_rel=0.2367900311946869, max_rel=2250.0, norm_rel=0.021803466603159904, ref_abs_avg=42.35061264038086, test_abs_avg=42.360931396484375
production_forward grad[23] vs paper_forward: mean_abs=0.7590923309326172, max_abs=3.0, mean_rel=0.09422648698091507, max_rel=11.195117950439453, norm_rel=0.02265639416873455, ref_abs_avg=33.41584777832031, test_abs_avg=33.46437454223633
production_forward grad[24] vs paper_forward: mean_abs=0.9587563276290894, max_abs=8.0, mean_rel=0.149194598197937, max_rel=2681.154541015625, norm_rel=0.023472579196095467, ref_abs_avg=41.10516357421875, test_abs_avg=41.10591506958008
production_forward grad[25] vs paper_forward: mean_abs=0.8807840347290039, max_abs=5.34375, mean_rel=0.2743687629699707, max_rel=3015.624755859375, norm_rel=0.021815147250890732, ref_abs_avg=40.58745193481445, test_abs_avg=40.588478088378906
production_forward grad[26] vs paper_forward: mean_abs=0.8806552886962891, max_abs=3.0625, mean_rel=0.10431678593158722, max_rel=14.871232032775879, norm_rel=0.024410972371697426, ref_abs_avg=36.48894500732422, test_abs_avg=36.4228515625
production_forward grad[27] vs paper_forward: mean_abs=1.1192976236343384, max_abs=8.0, mean_rel=0.16826365888118744, max_rel=1463.74267578125, norm_rel=0.02547977864742279, ref_abs_avg=44.1733283996582, test_abs_avg=44.17414093017578
production_forward grad[28] vs paper_forward: mean_abs=1.0414927005767822, max_abs=7.0, mean_rel=0.35930871963500977, max_rel=4250.0, norm_rel=0.023915166035294533, ref_abs_avg=43.76157760620117, test_abs_avg=43.75762939453125
production_forward grad[29] vs paper_forward: mean_abs=0.7940226793289185, max_abs=3.0, mean_rel=0.24780339002609253, max_rel=33.26356887817383, norm_rel=0.02294526994228363, ref_abs_avg=33.977142333984375, test_abs_avg=33.957359313964844
production_forward grad[30] vs paper_forward: mean_abs=1.0251898765563965, max_abs=7.125, mean_rel=0.18047362565994263, max_rel=1488.747802734375, norm_rel=0.02568155899643898, ref_abs_avg=40.13592529296875, test_abs_avg=40.13538360595703
production_forward grad[31] vs paper_forward: mean_abs=0.9510133266448975, max_abs=6.03125, mean_rel=0.26412689685821533, max_rel=2281.25, norm_rel=0.024136196821928024, ref_abs_avg=39.564552307128906, test_abs_avg=39.56824493408203
production_forward grad[32] vs paper_forward: mean_abs=0.7774825096130371, max_abs=3.0, mean_rel=0.16732153296470642, max_rel=24.423404693603516, norm_rel=0.025903742760419846, ref_abs_avg=30.489933013916016, test_abs_avg=30.470905303955078
production_forward grad[33] vs paper_forward: mean_abs=0.9579389095306396, max_abs=7.0, mean_rel=0.16662254929542542, max_rel=1680.688232421875, norm_rel=0.025716686621308327, ref_abs_avg=37.379493713378906, test_abs_avg=37.383262634277344
production_forward grad[34] vs paper_forward: mean_abs=0.8922616839408875, max_abs=5.25, mean_rel=0.2771426737308502, max_rel=2874.999755859375, norm_rel=0.024204682558774948, ref_abs_avg=37.01619338989258, test_abs_avg=37.02705383300781
production_forward grad[35] vs paper_forward: mean_abs=0.6764774322509766, max_abs=2.546875, mean_rel=0.11376909166574478, max_rel=11.531911849975586, norm_rel=0.023135142400860786, ref_abs_avg=29.347057342529297, test_abs_avg=29.31939697265625
production_forward grad[36] vs paper_forward: mean_abs=0.8928603529930115, max_abs=6.5, mean_rel=0.16015389561653137, max_rel=1117.2171630859375, norm_rel=0.025392305105924606, ref_abs_avg=35.32073211669922, test_abs_avg=35.324554443359375
production_forward grad[37] vs paper_forward: mean_abs=0.8332715034484863, max_abs=5.5, mean_rel=0.2747458517551422, max_rel=2062.5, norm_rel=0.024016864597797394, ref_abs_avg=34.81201171875, test_abs_avg=34.81011962890625
production_forward grad[38] vs paper_forward: mean_abs=0.6480854153633118, max_abs=3.0, mean_rel=0.11932405829429626, max_rel=9.614574432373047, norm_rel=0.02358151227235794, ref_abs_avg=27.47567367553711, test_abs_avg=27.496509552001953
production_forward grad[39] vs paper_forward: mean_abs=0.8360927700996399, max_abs=5.75, mean_rel=0.16754373908042908, max_rel=886.9954833984375, norm_rel=0.025114772841334343, ref_abs_avg=33.408267974853516, test_abs_avg=33.408958435058594
production_forward grad[40] vs paper_forward: mean_abs=0.783977210521698, max_abs=4.875, mean_rel=0.25682002305984497, max_rel=2718.749755859375, norm_rel=0.02385442890226841, ref_abs_avg=32.94148254394531, test_abs_avg=32.939308166503906
production_forward grad[41] vs paper_forward: mean_abs=0.6256797909736633, max_abs=2.5, mean_rel=0.09790411591529846, max_rel=5.648330211639404, norm_rel=0.024517124518752098, ref_abs_avg=25.666168212890625, test_abs_avg=25.601409912109375
production_forward grad[42] vs paper_forward: mean_abs=0.7966942191123962, max_abs=5.0, mean_rel=0.152598038315773, max_rel=840.2216796875, norm_rel=0.02491859532892704, ref_abs_avg=32.06758499145508, test_abs_avg=32.0678825378418
production_forward grad[43] vs paper_forward: mean_abs=0.7437758445739746, max_abs=4.5, mean_rel=0.26963499188423157, max_rel=1984.3748779296875, norm_rel=0.02350441738963127, ref_abs_avg=31.721424102783203, test_abs_avg=31.722238540649414
production_forward grad[44] vs paper_forward: mean_abs=0.6174259185791016, max_abs=3.015625, mean_rel=0.07811717689037323, max_rel=5.999468803405762, norm_rel=0.024307304993271828, ref_abs_avg=26.002662658691406, test_abs_avg=26.04584312438965
production_forward grad[45] vs paper_forward: mean_abs=0.756985068321228, max_abs=5.0, mean_rel=0.159613698720932, max_rel=1703.303955078125, norm_rel=0.024667149409651756, ref_abs_avg=30.77722930908203, test_abs_avg=30.780017852783203
production_forward grad[46] vs paper_forward: mean_abs=0.7029603123664856, max_abs=4.4375, mean_rel=0.2728354334831238, max_rel=3124.999755859375, norm_rel=0.023222729563713074, ref_abs_avg=30.39427947998047, test_abs_avg=30.40031623840332
production_forward grad[47] vs paper_forward: mean_abs=0.5586793422698975, max_abs=2.25, mean_rel=0.08571529388427734, max_rel=2.802093982696533, norm_rel=0.02284584380686283, ref_abs_avg=24.815248489379883, test_abs_avg=24.820526123046875
production_forward grad[48] vs paper_forward: mean_abs=0.7284860610961914, max_abs=5.0, mean_rel=0.16343986988067627, max_rel=1375.434326171875, norm_rel=0.024544674903154373, ref_abs_avg=29.768997192382812, test_abs_avg=29.770132064819336
production_forward grad[49] vs paper_forward: mean_abs=0.6801660656929016, max_abs=4.5, mean_rel=0.23853464424610138, max_rel=1773.4373779296875, norm_rel=0.023372387513518333, ref_abs_avg=29.155614852905273, test_abs_avg=29.15682601928711
production_forward grad[50] vs paper_forward: mean_abs=0.6410731077194214, max_abs=2.5, mean_rel=0.1987018883228302, max_rel=60.60416030883789, norm_rel=0.02444390207529068, ref_abs_avg=26.559221267700195, test_abs_avg=26.577816009521484
production_forward grad[51] vs paper_forward: mean_abs=0.8087671995162964, max_abs=6.0, mean_rel=0.16266968846321106, max_rel=1387.822021484375, norm_rel=0.026042761281132698, ref_abs_avg=31.16128921508789, test_abs_avg=31.161582946777344
production_forward grad[52] vs paper_forward: mean_abs=0.7584042549133301, max_abs=4.5625, mean_rel=0.287411630153656, max_rel=2781.249755859375, norm_rel=0.024960314854979515, ref_abs_avg=30.51162338256836, test_abs_avg=30.510801315307617
production_forward grad[53] vs paper_forward: mean_abs=0.5897030830383301, max_abs=2.25, mean_rel=0.1619981974363327, max_rel=25.25955581665039, norm_rel=0.025897284969687462, ref_abs_avg=22.798656463623047, test_abs_avg=22.860084533691406
production_forward grad[54] vs paper_forward: mean_abs=0.7475987672805786, max_abs=5.5, mean_rel=0.16832280158996582, max_rel=1224.8902587890625, norm_rel=0.025797734037041664, ref_abs_avg=29.05321502685547, test_abs_avg=29.05185317993164
production_forward grad[55] vs paper_forward: mean_abs=0.6958122253417969, max_abs=4.28125, mean_rel=0.25045761466026306, max_rel=1812.4998779296875, norm_rel=0.024758880957961082, ref_abs_avg=28.195940017700195, test_abs_avg=28.195964813232422
production_forward grad[56] vs paper_forward: mean_abs=0.49421119689941406, max_abs=2.75, mean_rel=0.06639717519283295, max_rel=2.9021451473236084, norm_rel=0.022427866235375404, ref_abs_avg=23.53277587890625, test_abs_avg=23.518041610717773
production_forward grad[57] vs paper_forward: mean_abs=0.695372462272644, max_abs=5.5, mean_rel=0.16669374704360962, max_rel=1546.611328125, norm_rel=0.025293312966823578, ref_abs_avg=27.537967681884766, test_abs_avg=27.537700653076172
production_forward grad[58] vs paper_forward: mean_abs=0.6469305753707886, max_abs=4.0, mean_rel=0.22469159960746765, max_rel=1999.9998779296875, norm_rel=0.023841656744480133, ref_abs_avg=27.171287536621094, test_abs_avg=27.176084518432617
production_forward grad[59] vs paper_forward: mean_abs=0.5309523344039917, max_abs=2.1875, mean_rel=0.14940860867500305, max_rel=14.36355972290039, norm_rel=0.025747336447238922, ref_abs_avg=20.528030395507812, test_abs_avg=20.53323745727539
production_forward grad[60] vs paper_forward: mean_abs=0.650511622428894, max_abs=5.0, mean_rel=0.17456494271755219, max_rel=2266.59912109375, norm_rel=0.025028616189956665, ref_abs_avg=26.024024963378906, test_abs_avg=26.02186393737793
production_forward grad[61] vs paper_forward: mean_abs=0.6073215007781982, max_abs=4.0, mean_rel=0.24436995387077332, max_rel=2781.249755859375, norm_rel=0.023583099246025085, ref_abs_avg=25.744098663330078, test_abs_avg=25.748802185058594
production_forward grad[62] vs paper_forward: mean_abs=0.47304463386535645, max_abs=1.71875, mean_rel=0.09327293187379837, max_rel=2.652101755142212, norm_rel=0.022353293374180794, ref_abs_avg=20.175086975097656, test_abs_avg=20.179363250732422
production_forward grad[63] vs paper_forward: mean_abs=0.6105145215988159, max_abs=5.0, mean_rel=0.15618832409381866, max_rel=657.7186279296875, norm_rel=0.02455207146704197, ref_abs_avg=24.879642486572266, test_abs_avg=24.879802703857422
production_forward grad[64] vs paper_forward: mean_abs=0.5717983245849609, max_abs=4.0, mean_rel=0.30741894245147705, max_rel=2624.999755859375, norm_rel=0.023094700649380684, ref_abs_avg=24.78605842590332, test_abs_avg=24.798099517822266
production_forward grad[65] vs paper_forward: mean_abs=0.44635486602783203, max_abs=1.78125, mean_rel=0.08934175968170166, max_rel=9.086952209472656, norm_rel=0.02233235165476799, ref_abs_avg=19.897676467895508, test_abs_avg=19.89260482788086
production_forward grad[66] vs paper_forward: mean_abs=0.5778524875640869, max_abs=4.5, mean_rel=0.15431441366672516, max_rel=903.2901611328125, norm_rel=0.023862503468990326, ref_abs_avg=24.233898162841797, test_abs_avg=24.233306884765625
production_forward grad[67] vs paper_forward: mean_abs=0.5389134883880615, max_abs=4.0, mean_rel=0.21711021661758423, max_rel=1749.9998779296875, norm_rel=0.02215246856212616, ref_abs_avg=24.287477493286133, test_abs_avg=24.281997680664062
production_forward grad[68] vs paper_forward: mean_abs=0.4301927089691162, max_abs=1.75, mean_rel=0.0693010464310646, max_rel=1.370891809463501, norm_rel=0.02228698693215847, ref_abs_avg=20.044126510620117, test_abs_avg=20.02105712890625
production_forward grad[69] vs paper_forward: mean_abs=0.5491492748260498, max_abs=5.0, mean_rel=0.15076929330825806, max_rel=1217.403564453125, norm_rel=0.023866085335612297, ref_abs_avg=23.04425048828125, test_abs_avg=23.045671463012695
production_forward grad[70] vs paper_forward: mean_abs=0.5088586807250977, max_abs=3.515625, mean_rel=0.20748254656791687, max_rel=2187.5, norm_rel=0.02178054116666317, ref_abs_avg=23.329242706298828, test_abs_avg=23.322521209716797
production_forward grad[71] vs paper_forward: mean_abs=0.38674163818359375, max_abs=1.5625, mean_rel=0.07088451087474823, max_rel=2.381079912185669, norm_rel=0.02067910134792328, ref_abs_avg=18.363231658935547, test_abs_avg=18.337921142578125
production_forward grad[72] vs paper_forward: mean_abs=0.5284834504127502, max_abs=4.59375, mean_rel=0.15216463804244995, max_rel=772.109130859375, norm_rel=0.0236225388944149, ref_abs_avg=22.376535415649414, test_abs_avg=22.377405166625977
production_forward grad[73] vs paper_forward: mean_abs=0.48607558012008667, max_abs=3.25, mean_rel=0.20863758027553558, max_rel=1125.0, norm_rel=0.02193559892475605, ref_abs_avg=22.10736083984375, test_abs_avg=22.10739517211914
production_forward grad[74] vs paper_forward: mean_abs=0.4795114994049072, max_abs=2.125, mean_rel=0.10302495211362839, max_rel=6.183169364929199, norm_rel=0.024184564128518105, ref_abs_avg=19.88960838317871, test_abs_avg=19.83453369140625
production_forward grad[75] vs paper_forward: mean_abs=0.5925881862640381, max_abs=5.0, mean_rel=0.1663198471069336, max_rel=1071.5506591796875, norm_rel=0.024811869487166405, ref_abs_avg=23.941505432128906, test_abs_avg=23.943477630615234
production_forward grad[76] vs paper_forward: mean_abs=0.5480004549026489, max_abs=3.875, mean_rel=0.20217761397361755, max_rel=2031.2498779296875, norm_rel=0.02293343096971512, ref_abs_avg=24.00171661376953, test_abs_avg=24.001853942871094
production_forward grad[77] vs paper_forward: mean_abs=0.44599682092666626, max_abs=1.5, mean_rel=0.2331911027431488, max_rel=37.596534729003906, norm_rel=0.02331848442554474, ref_abs_avg=19.023082733154297, test_abs_avg=19.059837341308594
production_forward grad[78] vs paper_forward: mean_abs=0.5485919117927551, max_abs=4.75, mean_rel=0.14867068827152252, max_rel=1042.059326171875, norm_rel=0.024125689640641212, ref_abs_avg=22.803691864013672, test_abs_avg=22.80560302734375
production_forward grad[79] vs paper_forward: mean_abs=0.5063979625701904, max_abs=3.875, mean_rel=0.20780906081199646, max_rel=1960.9373779296875, norm_rel=0.022604554891586304, ref_abs_avg=22.47801399230957, test_abs_avg=22.479162216186523
production_forward grad[80] vs paper_forward: mean_abs=0.38921791315078735, max_abs=1.71875, mean_rel=0.1269828975200653, max_rel=13.241498947143555, norm_rel=0.02238897979259491, ref_abs_avg=17.525646209716797, test_abs_avg=17.5333194732666
production_forward grad[81] vs paper_forward: mean_abs=0.5088188648223877, max_abs=4.5, mean_rel=0.1480741947889328, max_rel=1055.12109375, norm_rel=0.023430870845913887, ref_abs_avg=21.79647445678711, test_abs_avg=21.797826766967773
production_forward grad[82] vs paper_forward: mean_abs=0.46769630908966064, max_abs=4.5, mean_rel=0.24414926767349243, max_rel=1374.9998779296875, norm_rel=0.022040950134396553, ref_abs_avg=21.297077178955078, test_abs_avg=21.29102325439453
production_forward grad[83] vs paper_forward: mean_abs=0.3956001102924347, max_abs=1.65625, mean_rel=0.10353507101535797, max_rel=7.078819274902344, norm_rel=0.022467637434601784, ref_abs_avg=16.812191009521484, test_abs_avg=16.802507400512695
production_forward grad[84] vs paper_forward: mean_abs=0.48074984550476074, max_abs=6.0, mean_rel=0.14668697118759155, max_rel=886.2649536132812, norm_rel=0.023302897810935974, ref_abs_avg=20.68723487854004, test_abs_avg=20.687923431396484
production_forward grad[85] vs paper_forward: mean_abs=0.4475710391998291, max_abs=3.5, mean_rel=0.20849770307540894, max_rel=1250.0, norm_rel=0.021783461794257164, ref_abs_avg=20.669872283935547, test_abs_avg=20.669322967529297
production_forward grad[86] vs paper_forward: mean_abs=0.35093560814857483, max_abs=1.6875, mean_rel=0.11820357292890549, max_rel=8.844561576843262, norm_rel=0.020616086199879646, ref_abs_avg=17.455947875976562, test_abs_avg=17.466861724853516
production_forward grad[87] vs paper_forward: mean_abs=0.4539855122566223, max_abs=4.5, mean_rel=0.13567176461219788, max_rel=690.0256958007812, norm_rel=0.022706283256411552, ref_abs_avg=20.121456146240234, test_abs_avg=20.121959686279297
production_forward grad[88] vs paper_forward: mean_abs=0.4213108420372009, max_abs=4.0, mean_rel=0.19050097465515137, max_rel=1937.4998779296875, norm_rel=0.0212636049836874, ref_abs_avg=19.941051483154297, test_abs_avg=19.93594741821289
production_forward grad[89] vs paper_forward: mean_abs=0.3385190963745117, max_abs=1.25, mean_rel=0.07812889665365219, max_rel=2.7496490478515625, norm_rel=0.021160144358873367, ref_abs_avg=15.970993995666504, test_abs_avg=15.97926139831543
production_forward grad[90] vs paper_forward: mean_abs=0.4242115020751953, max_abs=4.5, mean_rel=0.13855285942554474, max_rel=1392.9749755859375, norm_rel=0.02208978496491909, ref_abs_avg=19.37919807434082, test_abs_avg=19.380046844482422
production_forward grad[91] vs paper_forward: mean_abs=0.3882216811180115, max_abs=3.25, mean_rel=0.15997616946697235, max_rel=1062.5, norm_rel=0.02011406049132347, ref_abs_avg=19.40580177307129, test_abs_avg=19.39910316467285
production_forward grad[92] vs paper_forward: mean_abs=0.3307461738586426, max_abs=1.265625, mean_rel=0.0702226459980011, max_rel=3.5401113033294678, norm_rel=0.020061034709215164, ref_abs_avg=16.783517837524414, test_abs_avg=16.786245346069336
production_forward grad[93] vs paper_forward: mean_abs=0.40319448709487915, max_abs=4.8125, mean_rel=0.1319807767868042, max_rel=1119.5272216796875, norm_rel=0.021609239280223846, ref_abs_avg=18.885150909423828, test_abs_avg=18.886720657348633
production_forward grad[94] vs paper_forward: mean_abs=0.36537134647369385, max_abs=3.25, mean_rel=0.16524389386177063, max_rel=1046.875, norm_rel=0.019946876913309097, ref_abs_avg=18.510009765625, test_abs_avg=18.50087547302246
production_forward grad[95] vs paper_forward: mean_abs=0.3004646301269531, max_abs=1.25, mean_rel=0.07339179515838623, max_rel=4.7839250564575195, norm_rel=0.019372349604964256, ref_abs_avg=15.578657150268555, test_abs_avg=15.592720985412598
production_forward grad[96] vs paper_forward: mean_abs=0.37940460443496704, max_abs=5.0, mean_rel=0.13221071660518646, max_rel=1049.5596923828125, norm_rel=0.021261977031826973, ref_abs_avg=18.131301879882812, test_abs_avg=18.13292694091797
production_forward grad[97] vs paper_forward: mean_abs=0.3451353907585144, max_abs=3.25, mean_rel=0.16931459307670593, max_rel=1578.1248779296875, norm_rel=0.01988258585333824, ref_abs_avg=17.683982849121094, test_abs_avg=17.681114196777344
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016515773022547364, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008857916109263897, max_abs=0.375, mean_rel=0.07542135566473007, max_rel=115.98004913330078, norm_rel=0.020659923553466797, ref_abs_avg=0.4635418653488159, test_abs_avg=0.46354615688323975
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.554398059844971, max_abs=54.5, mean_rel=0.14674939215183258, max_rel=378.17755126953125, norm_rel=0.021380538120865822, ref_abs_avg=320.3311767578125, test_abs_avg=320.5417175292969
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.308877944946289, max_abs=5.28125, mean_rel=0.08823666721582413, max_rel=6.026708602905273, norm_rel=0.022886725142598152, ref_abs_avg=57.24327087402344, test_abs_avg=57.140419006347656
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.7020647525787354, max_abs=12.25, mean_rel=0.1722956746816635, max_rel=1945.4036865234375, norm_rel=0.02528497949242592, ref_abs_avg=67.73782348632812, test_abs_avg=67.7391586303711
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5866906642913818, max_abs=10.75, mean_rel=0.45544877648353577, max_rel=5624.99951171875, norm_rel=0.023678472265601158, ref_abs_avg=67.30644226074219, test_abs_avg=67.32234191894531
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.135430097579956, max_abs=4.875, mean_rel=0.08721970021724701, max_rel=3.332123279571533, norm_rel=0.024378962814807892, ref_abs_avg=46.773712158203125, test_abs_avg=46.77793884277344
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4833041429519653, max_abs=10.0, mean_rel=0.15863338112831116, max_rel=1116.1617431640625, norm_rel=0.024980565533041954, ref_abs_avg=59.732994079589844, test_abs_avg=59.734153747558594
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.372335433959961, max_abs=8.5, mean_rel=0.38159143924713135, max_rel=3999.999755859375, norm_rel=0.02350570634007454, ref_abs_avg=58.63291931152344, test_abs_avg=58.629093170166016
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0299795866012573, max_abs=4.0, mean_rel=0.10311601310968399, max_rel=5.810488700866699, norm_rel=0.022773412987589836, ref_abs_avg=44.95298767089844, test_abs_avg=44.975990295410156
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3369890451431274, max_abs=9.0, mean_rel=0.15842582285404205, max_rel=2263.86181640625, norm_rel=0.02479151077568531, ref_abs_avg=54.25998306274414, test_abs_avg=54.259765625
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2401504516601562, max_abs=7.5, mean_rel=0.4966088533401489, max_rel=5125.0, norm_rel=0.023412484675645828, ref_abs_avg=53.31297302246094, test_abs_avg=53.315330505371094
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9950656890869141, max_abs=3.5, mean_rel=0.1502368301153183, max_rel=13.052074432373047, norm_rel=0.0233649592846632, ref_abs_avg=43.007537841796875, test_abs_avg=43.046173095703125
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.229537010192871, max_abs=8.0, mean_rel=0.1553405523300171, max_rel=621.5134887695312, norm_rel=0.024543648585677147, ref_abs_avg=50.392417907714844, test_abs_avg=50.39195251464844
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.142972469329834, max_abs=8.0, mean_rel=0.3775075674057007, max_rel=3281.249755859375, norm_rel=0.023056568577885628, ref_abs_avg=49.786075592041016, test_abs_avg=49.78843688964844
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8578140735626221, max_abs=3.125, mean_rel=0.07832588255405426, max_rel=3.6298279762268066, norm_rel=0.023420456796884537, ref_abs_avg=36.49650573730469, test_abs_avg=36.52490234375
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1432437896728516, max_abs=7.5, mean_rel=0.16964837908744812, max_rel=2643.42236328125, norm_rel=0.02445496991276741, ref_abs_avg=47.01763916015625, test_abs_avg=47.01679229736328
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0593693256378174, max_abs=6.5, mean_rel=0.32589858770370483, max_rel=2796.874755859375, norm_rel=0.022941380739212036, ref_abs_avg=46.422996520996094, test_abs_avg=46.42158889770508
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8335914611816406, max_abs=3.5, mean_rel=0.08892977237701416, max_rel=5.501794815063477, norm_rel=0.023070864379405975, ref_abs_avg=37.062957763671875, test_abs_avg=37.09575653076172
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0847814083099365, max_abs=8.0, mean_rel=0.17030373215675354, max_rel=1603.99755859375, norm_rel=0.024188289418816566, ref_abs_avg=45.105133056640625, test_abs_avg=45.10886001586914
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9953042268753052, max_abs=6.0, mean_rel=0.3920576274394989, max_rel=3171.874755859375, norm_rel=0.02265055477619171, ref_abs_avg=44.156517028808594, test_abs_avg=44.16124725341797
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.8156356811523438, max_abs=2.75, mean_rel=0.18757063150405884, max_rel=25.172861099243164, norm_rel=0.024003978818655014, ref_abs_avg=32.83831024169922, test_abs_avg=32.84986877441406
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0247652530670166, max_abs=9.0, mean_rel=0.15964138507843018, max_rel=1706.5335693359375, norm_rel=0.02409527264535427, ref_abs_avg=42.76148223876953, test_abs_avg=42.76500701904297
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.947211742401123, max_abs=6.5, mean_rel=0.28622475266456604, max_rel=3343.749755859375, norm_rel=0.02244430221617222, ref_abs_avg=42.35061264038086, test_abs_avg=42.35931396484375
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7587637901306152, max_abs=3.25, mean_rel=0.10639617592096329, max_rel=13.98608684539795, norm_rel=0.023161571472883224, ref_abs_avg=33.41584777832031, test_abs_avg=33.48571014404297
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9817303419113159, max_abs=7.0, mean_rel=0.15314605832099915, max_rel=2825.306884765625, norm_rel=0.02400190383195877, ref_abs_avg=41.10516357421875, test_abs_avg=41.106201171875
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.9049383997917175, max_abs=5.5, mean_rel=0.2777574062347412, max_rel=2999.999755859375, norm_rel=0.022403541952371597, ref_abs_avg=40.58745193481445, test_abs_avg=40.58768844604492
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8897480964660645, max_abs=3.625, mean_rel=0.11282810568809509, max_rel=11.417389869689941, norm_rel=0.024816026911139488, ref_abs_avg=36.48894500732422, test_abs_avg=36.45887756347656
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1423876285552979, max_abs=8.5, mean_rel=0.16475391387939453, max_rel=1178.0494384765625, norm_rel=0.025988399982452393, ref_abs_avg=44.1733283996582, test_abs_avg=44.174720764160156
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.065995454788208, max_abs=6.75, mean_rel=0.339189350605011, max_rel=4312.5, norm_rel=0.024502068758010864, ref_abs_avg=43.76157760620117, test_abs_avg=43.75624465942383
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7967357635498047, max_abs=3.0, mean_rel=0.21858327090740204, max_rel=29.689117431640625, norm_rel=0.02305115945637226, ref_abs_avg=33.977142333984375, test_abs_avg=33.95409393310547
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0472595691680908, max_abs=7.0, mean_rel=0.18796586990356445, max_rel=1452.0806884765625, norm_rel=0.026199588552117348, ref_abs_avg=40.13592529296875, test_abs_avg=40.134498596191406
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9770312309265137, max_abs=6.15625, mean_rel=0.29005974531173706, max_rel=2968.749755859375, norm_rel=0.024791456758975983, ref_abs_avg=39.564552307128906, test_abs_avg=39.567710876464844
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7819637060165405, max_abs=2.875, mean_rel=0.12383142113685608, max_rel=10.820002555847168, norm_rel=0.025803813710808754, ref_abs_avg=30.489933013916016, test_abs_avg=30.42111587524414
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9747881889343262, max_abs=6.5, mean_rel=0.1683415174484253, max_rel=1009.652587890625, norm_rel=0.026168910786509514, ref_abs_avg=37.379493713378906, test_abs_avg=37.38225173950195
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9086827039718628, max_abs=6.5, mean_rel=0.290353924036026, max_rel=2156.25, norm_rel=0.02464435249567032, ref_abs_avg=37.01619338989258, test_abs_avg=37.02729415893555
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7121227979660034, max_abs=2.75, mean_rel=0.14210036396980286, max_rel=18.877084732055664, norm_rel=0.024392573162913322, ref_abs_avg=29.347057342529297, test_abs_avg=29.392175674438477
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.9084442257881165, max_abs=6.0, mean_rel=0.16196279227733612, max_rel=1193.3597412109375, norm_rel=0.025827663019299507, ref_abs_avg=35.32073211669922, test_abs_avg=35.32464599609375
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8497844934463501, max_abs=5.25, mean_rel=0.28821828961372375, max_rel=2140.625, norm_rel=0.024485155940055847, ref_abs_avg=34.81201171875, test_abs_avg=34.80901336669922
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6611177921295166, max_abs=2.25, mean_rel=0.12027643620967865, max_rel=7.383277893066406, norm_rel=0.023845190182328224, ref_abs_avg=27.47567367553711, test_abs_avg=27.495079040527344
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8504667282104492, max_abs=6.0, mean_rel=0.1699731945991516, max_rel=1271.8934326171875, norm_rel=0.025520719587802887, ref_abs_avg=33.408267974853516, test_abs_avg=33.40837097167969
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7962467074394226, max_abs=4.65625, mean_rel=0.27311795949935913, max_rel=2812.499755859375, norm_rel=0.024210486561059952, ref_abs_avg=32.94148254394531, test_abs_avg=32.94032287597656
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6303105354309082, max_abs=3.0, mean_rel=0.10759247839450836, max_rel=7.4758992195129395, norm_rel=0.02486647106707096, ref_abs_avg=25.666168212890625, test_abs_avg=25.60381317138672
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8080847263336182, max_abs=6.0, mean_rel=0.15736189484596252, max_rel=1299.1785888671875, norm_rel=0.0252775177359581, ref_abs_avg=32.06758499145508, test_abs_avg=32.06733703613281
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7540937662124634, max_abs=4.5, mean_rel=0.27957427501678467, max_rel=1999.9998779296875, norm_rel=0.023828327655792236, ref_abs_avg=31.721424102783203, test_abs_avg=31.72186851501465
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5972199440002441, max_abs=2.59375, mean_rel=0.07466979324817657, max_rel=2.8705947399139404, norm_rel=0.02405347116291523, ref_abs_avg=26.002662658691406, test_abs_avg=26.04230308532715
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7678085565567017, max_abs=5.0, mean_rel=0.16239458322525024, max_rel=1843.28076171875, norm_rel=0.025004234164953232, ref_abs_avg=30.77722930908203, test_abs_avg=30.77981948852539
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7141710519790649, max_abs=4.3125, mean_rel=0.27345263957977295, max_rel=3624.999755859375, norm_rel=0.023573407903313637, ref_abs_avg=30.39427947998047, test_abs_avg=30.401424407958984
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5644296407699585, max_abs=2.0, mean_rel=0.08865329623222351, max_rel=2.9351749420166016, norm_rel=0.02289421670138836, ref_abs_avg=24.815248489379883, test_abs_avg=24.823551177978516
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7372745275497437, max_abs=6.0, mean_rel=0.16409620642662048, max_rel=1306.1962890625, norm_rel=0.02483152598142624, ref_abs_avg=29.768997192382812, test_abs_avg=29.77092742919922
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6889593005180359, max_abs=4.5, mean_rel=0.2582724988460541, max_rel=1937.4998779296875, norm_rel=0.023673037067055702, ref_abs_avg=29.155614852905273, test_abs_avg=29.15378189086914
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.680078387260437, max_abs=2.625, mean_rel=0.14956721663475037, max_rel=40.08710861206055, norm_rel=0.025990737602114677, ref_abs_avg=26.559221267700195, test_abs_avg=26.605266571044922
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8206952810287476, max_abs=6.0, mean_rel=0.1706598997116089, max_rel=1167.5233154296875, norm_rel=0.026417316868901253, ref_abs_avg=31.16128921508789, test_abs_avg=31.162090301513672
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7720159292221069, max_abs=4.65625, mean_rel=0.28169193863868713, max_rel=2312.5, norm_rel=0.025402449071407318, ref_abs_avg=30.51162338256836, test_abs_avg=30.507966995239258
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5751557350158691, max_abs=2.0, mean_rel=0.14017120003700256, max_rel=15.295977592468262, norm_rel=0.02536659501492977, ref_abs_avg=22.798656463623047, test_abs_avg=22.83224868774414
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7576315999031067, max_abs=7.0, mean_rel=0.16888101398944855, max_rel=1481.408203125, norm_rel=0.02612290158867836, ref_abs_avg=29.05321502685547, test_abs_avg=29.052858352661133
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7077112197875977, max_abs=4.625, mean_rel=0.2646004855632782, max_rel=1968.7498779296875, norm_rel=0.02517233043909073, ref_abs_avg=28.195940017700195, test_abs_avg=28.196002960205078
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5151258111000061, max_abs=2.25, mean_rel=0.07626619935035706, max_rel=6.330182075500488, norm_rel=0.0227150097489357, ref_abs_avg=23.53277587890625, test_abs_avg=23.53121566772461
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7042089700698853, max_abs=6.0, mean_rel=0.16827720403671265, max_rel=1460.7100830078125, norm_rel=0.025620486587285995, ref_abs_avg=27.537967681884766, test_abs_avg=27.538169860839844
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6567220687866211, max_abs=4.25, mean_rel=0.22449827194213867, max_rel=2125.0, norm_rel=0.024206038564443588, ref_abs_avg=27.171287536621094, test_abs_avg=27.174638748168945
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.528889536857605, max_abs=2.0, mean_rel=0.17063501477241516, max_rel=14.850631713867188, norm_rel=0.025753328576683998, ref_abs_avg=20.528030395507812, test_abs_avg=20.511991500854492
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6588982343673706, max_abs=5.0, mean_rel=0.17645275592803955, max_rel=1973.6356201171875, norm_rel=0.025322983041405678, ref_abs_avg=26.024024963378906, test_abs_avg=26.02164649963379
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6141282916069031, max_abs=4.0, mean_rel=0.25007182359695435, max_rel=2874.999755859375, norm_rel=0.02385450154542923, ref_abs_avg=25.744098663330078, test_abs_avg=25.74829864501953
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4953359067440033, max_abs=1.625, mean_rel=0.10497552156448364, max_rel=4.479918479919434, norm_rel=0.02324533835053444, ref_abs_avg=20.175086975097656, test_abs_avg=20.17542266845703
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6172455549240112, max_abs=5.0, mean_rel=0.1578291952610016, max_rel=832.8416137695312, norm_rel=0.02482011541724205, ref_abs_avg=24.879642486572266, test_abs_avg=24.88003158569336
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5767136812210083, max_abs=4.0, mean_rel=0.2923479974269867, max_rel=2687.499755859375, norm_rel=0.023288749158382416, ref_abs_avg=24.78605842590332, test_abs_avg=24.79767608642578
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.44824886322021484, max_abs=1.59375, mean_rel=0.08007773011922836, max_rel=7.722439289093018, norm_rel=0.022093120962381363, ref_abs_avg=19.897676467895508, test_abs_avg=19.886077880859375
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.583572268486023, max_abs=5.0, mean_rel=0.15241894125938416, max_rel=898.5262451171875, norm_rel=0.02409455180168152, ref_abs_avg=24.233898162841797, test_abs_avg=24.231975555419922
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5456730127334595, max_abs=4.0, mean_rel=0.22875839471817017, max_rel=1749.9998779296875, norm_rel=0.022413557395339012, ref_abs_avg=24.287477493286133, test_abs_avg=24.281883239746094
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4331618547439575, max_abs=1.625, mean_rel=0.06427355110645294, max_rel=1.5169588327407837, norm_rel=0.022122010588645935, ref_abs_avg=20.044126510620117, test_abs_avg=20.025171279907227
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5546785593032837, max_abs=5.0, mean_rel=0.1514655351638794, max_rel=961.8911743164062, norm_rel=0.02410499006509781, ref_abs_avg=23.04425048828125, test_abs_avg=23.04552459716797
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5174343585968018, max_abs=3.53125, mean_rel=0.22219377756118774, max_rel=1968.7498779296875, norm_rel=0.02215871401131153, ref_abs_avg=23.329242706298828, test_abs_avg=23.321758270263672
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.389660120010376, max_abs=1.5625, mean_rel=0.06814920902252197, max_rel=2.635240077972412, norm_rel=0.020789576694369316, ref_abs_avg=18.363231658935547, test_abs_avg=18.33396339416504
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5322652459144592, max_abs=5.0, mean_rel=0.1537717878818512, max_rel=684.6367797851562, norm_rel=0.02378399483859539, ref_abs_avg=22.376535415649414, test_abs_avg=22.37759780883789
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.49241286516189575, max_abs=3.0546875, mean_rel=0.20452356338500977, max_rel=1281.25, norm_rel=0.022251617163419724, ref_abs_avg=22.10736083984375, test_abs_avg=22.108182907104492
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.46580564975738525, max_abs=2.25, mean_rel=0.09026308357715607, max_rel=6.621652603149414, norm_rel=0.02419831044971943, ref_abs_avg=19.88960838317871, test_abs_avg=19.847021102905273
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5987859964370728, max_abs=5.0, mean_rel=0.1714160293340683, max_rel=1343.439697265625, norm_rel=0.025054607540369034, ref_abs_avg=23.941505432128906, test_abs_avg=23.943103790283203
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5556272268295288, max_abs=4.0, mean_rel=0.20039209723472595, max_rel=2468.75, norm_rel=0.023253051564097404, ref_abs_avg=24.00171661376953, test_abs_avg=23.9997615814209
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4504517912864685, max_abs=1.5, mean_rel=0.35683056712150574, max_rel=94.25271606445312, norm_rel=0.023353679105639458, ref_abs_avg=19.023082733154297, test_abs_avg=19.04397201538086
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5540422797203064, max_abs=6.0, mean_rel=0.1487446427345276, max_rel=881.2118530273438, norm_rel=0.024342862889170647, ref_abs_avg=22.803691864013672, test_abs_avg=22.804975509643555
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.510145902633667, max_abs=3.5625, mean_rel=0.2157902717590332, max_rel=1992.1873779296875, norm_rel=0.022750958800315857, ref_abs_avg=22.47801399230957, test_abs_avg=22.47967529296875
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.40177440643310547, max_abs=1.75, mean_rel=0.12462539225816727, max_rel=13.1433687210083, norm_rel=0.023022113367915154, ref_abs_avg=17.525646209716797, test_abs_avg=17.56163215637207
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5134750604629517, max_abs=4.875, mean_rel=0.14768677949905396, max_rel=643.3626708984375, norm_rel=0.023621326312422752, ref_abs_avg=21.79647445678711, test_abs_avg=21.798128128051758
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4756902754306793, max_abs=3.875, mean_rel=0.24220791459083557, max_rel=1390.6248779296875, norm_rel=0.022407744079828262, ref_abs_avg=21.297077178955078, test_abs_avg=21.292173385620117
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3896365165710449, max_abs=1.78125, mean_rel=0.11155213415622711, max_rel=7.560719013214111, norm_rel=0.02298160456120968, ref_abs_avg=16.812191009521484, test_abs_avg=16.788127899169922
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.48336905241012573, max_abs=6.5, mean_rel=0.14901834726333618, max_rel=905.66845703125, norm_rel=0.023429470136761665, ref_abs_avg=20.68723487854004, test_abs_avg=20.686622619628906
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4563142657279968, max_abs=3.90625, mean_rel=0.22225171327590942, max_rel=1718.7498779296875, norm_rel=0.02225431054830551, ref_abs_avg=20.669872283935547, test_abs_avg=20.667829513549805
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.37133097648620605, max_abs=1.375, mean_rel=0.12290135771036148, max_rel=10.067745208740234, norm_rel=0.021319886669516563, ref_abs_avg=17.455947875976562, test_abs_avg=17.460281372070312
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4566163718700409, max_abs=5.0, mean_rel=0.1374375969171524, max_rel=787.944580078125, norm_rel=0.02282685600221157, ref_abs_avg=20.121456146240234, test_abs_avg=20.121767044067383
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4229288101196289, max_abs=4.0, mean_rel=0.19523078203201294, max_rel=1718.7498779296875, norm_rel=0.021330567076802254, ref_abs_avg=19.941051483154297, test_abs_avg=19.941055297851562
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.33591556549072266, max_abs=1.140625, mean_rel=0.07433965802192688, max_rel=3.640972852706909, norm_rel=0.02110300399363041, ref_abs_avg=15.970993995666504, test_abs_avg=15.970893859863281
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.42757946252822876, max_abs=4.5, mean_rel=0.14103010296821594, max_rel=1222.7236328125, norm_rel=0.02224614843726158, ref_abs_avg=19.37919807434082, test_abs_avg=19.380050659179688
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.39440202713012695, max_abs=3.125, mean_rel=0.1547536700963974, max_rel=859.3749389648438, norm_rel=0.020531602203845978, ref_abs_avg=19.40580177307129, test_abs_avg=19.395788192749023
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.33208513259887695, max_abs=1.40625, mean_rel=0.07425785064697266, max_rel=3.5549235343933105, norm_rel=0.02026055008172989, ref_abs_avg=16.783517837524414, test_abs_avg=16.784748077392578
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.4051366448402405, max_abs=5.0, mean_rel=0.13410070538520813, max_rel=1149.5806884765625, norm_rel=0.021700067445635796, ref_abs_avg=18.885150909423828, test_abs_avg=18.886625289916992
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3652483820915222, max_abs=3.5, mean_rel=0.169254332780838, max_rel=1296.8748779296875, norm_rel=0.019955115392804146, ref_abs_avg=18.510009765625, test_abs_avg=18.50348472595215
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3056619167327881, max_abs=1.30078125, mean_rel=0.0720047876238823, max_rel=3.201310157775879, norm_rel=0.019720247015357018, ref_abs_avg=15.578657150268555, test_abs_avg=15.594236373901367
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.38007014989852905, max_abs=6.0, mean_rel=0.13100355863571167, max_rel=913.7697143554688, norm_rel=0.02130974270403385, ref_abs_avg=18.131301879882812, test_abs_avg=18.133310317993164
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3492141366004944, max_abs=3.0, mean_rel=0.162607342004776, max_rel=1312.4998779296875, norm_rel=0.019994469359517097, ref_abs_avg=17.683982849121094, test_abs_avg=17.685848236083984
production_forward2 vs paper_forward output: mean_abs=0.0016483073122799397, max_abs=0.0546875
production_forward2 grad[0] vs paper_forward: mean_abs=0.00850963406264782, max_abs=0.4375, mean_rel=0.072794109582901, max_rel=80.74031066894531, norm_rel=0.019973477348685265, ref_abs_avg=0.4635418653488159, test_abs_avg=0.4635607600212097
production_forward2 grad[1] vs paper_forward: mean_abs=7.39271879196167, max_abs=64.0, mean_rel=0.1451137810945511, max_rel=365.0574035644531, norm_rel=0.020964257419109344, ref_abs_avg=320.3311767578125, test_abs_avg=320.4688720703125
production_forward2 grad[2] vs paper_forward: mean_abs=1.2929344177246094, max_abs=5.0, mean_rel=0.08883114904165268, max_rel=5.770637035369873, norm_rel=0.022183774039149284, ref_abs_avg=57.24327087402344, test_abs_avg=57.1498908996582
production_forward2 grad[3] vs paper_forward: mean_abs=1.6490485668182373, max_abs=13.0, mean_rel=0.16746053099632263, max_rel=1393.175048828125, norm_rel=0.024506786838173866, ref_abs_avg=67.73782348632812, test_abs_avg=67.74073791503906
production_forward2 grad[4] vs paper_forward: mean_abs=1.5321176052093506, max_abs=9.875, mean_rel=0.4274808168411255, max_rel=4687.5, norm_rel=0.022858506068587303, ref_abs_avg=67.30644226074219, test_abs_avg=67.31477355957031
production_forward2 grad[5] vs paper_forward: mean_abs=1.0545845031738281, max_abs=4.25, mean_rel=0.10172292590141296, max_rel=4.768734931945801, norm_rel=0.023423384875059128, ref_abs_avg=46.773712158203125, test_abs_avg=46.75625991821289
production_forward2 grad[6] vs paper_forward: mean_abs=1.4387842416763306, max_abs=10.0, mean_rel=0.15694186091423035, max_rel=885.1193237304688, norm_rel=0.024245038628578186, ref_abs_avg=59.732994079589844, test_abs_avg=59.73745346069336
production_forward2 grad[7] vs paper_forward: mean_abs=1.3287022113800049, max_abs=8.5, mean_rel=0.3734366297721863, max_rel=3874.999755859375, norm_rel=0.022762654349207878, ref_abs_avg=58.63291931152344, test_abs_avg=58.629119873046875
production_forward2 grad[8] vs paper_forward: mean_abs=1.0186958312988281, max_abs=4.296875, mean_rel=0.09944825619459152, max_rel=4.340941429138184, norm_rel=0.022632041946053505, ref_abs_avg=44.95298767089844, test_abs_avg=44.994625091552734
production_forward2 grad[9] vs paper_forward: mean_abs=1.2991154193878174, max_abs=9.0, mean_rel=0.1545378565788269, max_rel=1564.8175048828125, norm_rel=0.024090571328997612, ref_abs_avg=54.25998306274414, test_abs_avg=54.260677337646484
production_forward2 grad[10] vs paper_forward: mean_abs=1.198035717010498, max_abs=7.5, mean_rel=0.4801519215106964, max_rel=4500.0, norm_rel=0.022616803646087646, ref_abs_avg=53.31297302246094, test_abs_avg=53.320098876953125
production_forward2 grad[11] vs paper_forward: mean_abs=0.9676964282989502, max_abs=3.25, mean_rel=0.1421402096748352, max_rel=12.426764488220215, norm_rel=0.022503886371850967, ref_abs_avg=43.007537841796875, test_abs_avg=43.063899993896484
production_forward2 grad[12] vs paper_forward: mean_abs=1.1973934173583984, max_abs=8.5, mean_rel=0.15239213407039642, max_rel=879.1559448242188, norm_rel=0.02390582673251629, ref_abs_avg=50.392417907714844, test_abs_avg=50.3941650390625
production_forward2 grad[13] vs paper_forward: mean_abs=1.1082522869110107, max_abs=8.0, mean_rel=0.37468230724334717, max_rel=3624.999755859375, norm_rel=0.022373458370566368, ref_abs_avg=49.786075592041016, test_abs_avg=49.78630828857422
production_forward2 grad[14] vs paper_forward: mean_abs=0.8641929626464844, max_abs=3.0, mean_rel=0.07823817431926727, max_rel=3.7232894897460938, norm_rel=0.02316437102854252, ref_abs_avg=36.49650573730469, test_abs_avg=36.54313278198242
production_forward2 grad[15] vs paper_forward: mean_abs=1.1164917945861816, max_abs=7.5, mean_rel=0.16688594222068787, max_rel=2114.716064453125, norm_rel=0.023884380236268044, ref_abs_avg=47.01763916015625, test_abs_avg=47.01746368408203
production_forward2 grad[16] vs paper_forward: mean_abs=1.024684190750122, max_abs=6.0, mean_rel=0.29424235224723816, max_rel=2734.374755859375, norm_rel=0.022206759080290794, ref_abs_avg=46.422996520996094, test_abs_avg=46.42253112792969
production_forward2 grad[17] vs paper_forward: mean_abs=0.8142795562744141, max_abs=3.25, mean_rel=0.09660416841506958, max_rel=6.65287971496582, norm_rel=0.022484101355075836, ref_abs_avg=37.062957763671875, test_abs_avg=37.106990814208984
production_forward2 grad[18] vs paper_forward: mean_abs=1.0596075057983398, max_abs=7.0, mean_rel=0.1608903706073761, max_rel=1260.625244140625, norm_rel=0.02364419959485531, ref_abs_avg=45.105133056640625, test_abs_avg=45.11012268066406
production_forward2 grad[19] vs paper_forward: mean_abs=0.9710997343063354, max_abs=6.0625, mean_rel=0.3649677336215973, max_rel=3640.624755859375, norm_rel=0.022095633670687675, ref_abs_avg=44.156517028808594, test_abs_avg=44.16495895385742
production_forward2 grad[20] vs paper_forward: mean_abs=0.8150510787963867, max_abs=3.0, mean_rel=0.1291193813085556, max_rel=13.586956024169922, norm_rel=0.02431187964975834, ref_abs_avg=32.83831024169922, test_abs_avg=32.82366180419922
production_forward2 grad[21] vs paper_forward: mean_abs=1.002755045890808, max_abs=7.5, mean_rel=0.14997053146362305, max_rel=1264.386474609375, norm_rel=0.023576129227876663, ref_abs_avg=42.76148223876953, test_abs_avg=42.765480041503906
production_forward2 grad[22] vs paper_forward: mean_abs=0.919110894203186, max_abs=6.5, mean_rel=0.2367900311946869, max_rel=2250.0, norm_rel=0.021803466603159904, ref_abs_avg=42.35061264038086, test_abs_avg=42.360931396484375
production_forward2 grad[23] vs paper_forward: mean_abs=0.7590923309326172, max_abs=3.0, mean_rel=0.09422648698091507, max_rel=11.195117950439453, norm_rel=0.02265639416873455, ref_abs_avg=33.41584777832031, test_abs_avg=33.46437454223633
production_forward2 grad[24] vs paper_forward: mean_abs=0.9587563276290894, max_abs=8.0, mean_rel=0.149194598197937, max_rel=2681.154541015625, norm_rel=0.023472579196095467, ref_abs_avg=41.10516357421875, test_abs_avg=41.10591506958008
production_forward2 grad[25] vs paper_forward: mean_abs=0.8807840347290039, max_abs=5.34375, mean_rel=0.2743687629699707, max_rel=3015.624755859375, norm_rel=0.021815147250890732, ref_abs_avg=40.58745193481445, test_abs_avg=40.588478088378906
production_forward2 grad[26] vs paper_forward: mean_abs=0.8806552886962891, max_abs=3.0625, mean_rel=0.10431678593158722, max_rel=14.871232032775879, norm_rel=0.024410972371697426, ref_abs_avg=36.48894500732422, test_abs_avg=36.4228515625
production_forward2 grad[27] vs paper_forward: mean_abs=1.1192976236343384, max_abs=8.0, mean_rel=0.16826365888118744, max_rel=1463.74267578125, norm_rel=0.02547977864742279, ref_abs_avg=44.1733283996582, test_abs_avg=44.17414093017578
production_forward2 grad[28] vs paper_forward: mean_abs=1.0414927005767822, max_abs=7.0, mean_rel=0.35930871963500977, max_rel=4250.0, norm_rel=0.023915166035294533, ref_abs_avg=43.76157760620117, test_abs_avg=43.75762939453125
production_forward2 grad[29] vs paper_forward: mean_abs=0.7940226793289185, max_abs=3.0, mean_rel=0.24780339002609253, max_rel=33.26356887817383, norm_rel=0.02294526994228363, ref_abs_avg=33.977142333984375, test_abs_avg=33.957359313964844
production_forward2 grad[30] vs paper_forward: mean_abs=1.0251898765563965, max_abs=7.125, mean_rel=0.18047362565994263, max_rel=1488.747802734375, norm_rel=0.02568155899643898, ref_abs_avg=40.13592529296875, test_abs_avg=40.13538360595703
production_forward2 grad[31] vs paper_forward: mean_abs=0.9510133266448975, max_abs=6.03125, mean_rel=0.26412689685821533, max_rel=2281.25, norm_rel=0.024136196821928024, ref_abs_avg=39.564552307128906, test_abs_avg=39.56824493408203
production_forward2 grad[32] vs paper_forward: mean_abs=0.7774825096130371, max_abs=3.0, mean_rel=0.16732153296470642, max_rel=24.423404693603516, norm_rel=0.025903742760419846, ref_abs_avg=30.489933013916016, test_abs_avg=30.470905303955078
production_forward2 grad[33] vs paper_forward: mean_abs=0.9579389095306396, max_abs=7.0, mean_rel=0.16662254929542542, max_rel=1680.688232421875, norm_rel=0.025716686621308327, ref_abs_avg=37.379493713378906, test_abs_avg=37.383262634277344
production_forward2 grad[34] vs paper_forward: mean_abs=0.8922616839408875, max_abs=5.25, mean_rel=0.2771426737308502, max_rel=2874.999755859375, norm_rel=0.024204682558774948, ref_abs_avg=37.01619338989258, test_abs_avg=37.02705383300781
production_forward2 grad[35] vs paper_forward: mean_abs=0.6764774322509766, max_abs=2.546875, mean_rel=0.11376909166574478, max_rel=11.531911849975586, norm_rel=0.023135142400860786, ref_abs_avg=29.347057342529297, test_abs_avg=29.31939697265625
production_forward2 grad[36] vs paper_forward: mean_abs=0.8928603529930115, max_abs=6.5, mean_rel=0.16015389561653137, max_rel=1117.2171630859375, norm_rel=0.025392305105924606, ref_abs_avg=35.32073211669922, test_abs_avg=35.324554443359375
production_forward2 grad[37] vs paper_forward: mean_abs=0.8332715034484863, max_abs=5.5, mean_rel=0.2747458517551422, max_rel=2062.5, norm_rel=0.024016864597797394, ref_abs_avg=34.81201171875, test_abs_avg=34.81011962890625
production_forward2 grad[38] vs paper_forward: mean_abs=0.6480854153633118, max_abs=3.0, mean_rel=0.11932405829429626, max_rel=9.614574432373047, norm_rel=0.02358151227235794, ref_abs_avg=27.47567367553711, test_abs_avg=27.496509552001953
production_forward2 grad[39] vs paper_forward: mean_abs=0.8360927700996399, max_abs=5.75, mean_rel=0.16754373908042908, max_rel=886.9954833984375, norm_rel=0.025114772841334343, ref_abs_avg=33.408267974853516, test_abs_avg=33.408958435058594
production_forward2 grad[40] vs paper_forward: mean_abs=0.783977210521698, max_abs=4.875, mean_rel=0.25682002305984497, max_rel=2718.749755859375, norm_rel=0.02385442890226841, ref_abs_avg=32.94148254394531, test_abs_avg=32.939308166503906
production_forward2 grad[41] vs paper_forward: mean_abs=0.6256797909736633, max_abs=2.5, mean_rel=0.09790411591529846, max_rel=5.648330211639404, norm_rel=0.024517124518752098, ref_abs_avg=25.666168212890625, test_abs_avg=25.601409912109375
production_forward2 grad[42] vs paper_forward: mean_abs=0.7966942191123962, max_abs=5.0, mean_rel=0.152598038315773, max_rel=840.2216796875, norm_rel=0.02491859532892704, ref_abs_avg=32.06758499145508, test_abs_avg=32.0678825378418
production_forward2 grad[43] vs paper_forward: mean_abs=0.7437758445739746, max_abs=4.5, mean_rel=0.26963499188423157, max_rel=1984.3748779296875, norm_rel=0.02350441738963127, ref_abs_avg=31.721424102783203, test_abs_avg=31.722238540649414
production_forward2 grad[44] vs paper_forward: mean_abs=0.6174259185791016, max_abs=3.015625, mean_rel=0.07811717689037323, max_rel=5.999468803405762, norm_rel=0.024307304993271828, ref_abs_avg=26.002662658691406, test_abs_avg=26.04584312438965
production_forward2 grad[45] vs paper_forward: mean_abs=0.756985068321228, max_abs=5.0, mean_rel=0.159613698720932, max_rel=1703.303955078125, norm_rel=0.024667149409651756, ref_abs_avg=30.77722930908203, test_abs_avg=30.780017852783203
production_forward2 grad[46] vs paper_forward: mean_abs=0.7029603123664856, max_abs=4.4375, mean_rel=0.2728354334831238, max_rel=3124.999755859375, norm_rel=0.023222729563713074, ref_abs_avg=30.39427947998047, test_abs_avg=30.40031623840332
production_forward2 grad[47] vs paper_forward: mean_abs=0.5586793422698975, max_abs=2.25, mean_rel=0.08571529388427734, max_rel=2.802093982696533, norm_rel=0.02284584380686283, ref_abs_avg=24.815248489379883, test_abs_avg=24.820526123046875
production_forward2 grad[48] vs paper_forward: mean_abs=0.7284860610961914, max_abs=5.0, mean_rel=0.16343986988067627, max_rel=1375.434326171875, norm_rel=0.024544674903154373, ref_abs_avg=29.768997192382812, test_abs_avg=29.770132064819336
production_forward2 grad[49] vs paper_forward: mean_abs=0.6801660656929016, max_abs=4.5, mean_rel=0.23853464424610138, max_rel=1773.4373779296875, norm_rel=0.023372387513518333, ref_abs_avg=29.155614852905273, test_abs_avg=29.15682601928711
production_forward2 grad[50] vs paper_forward: mean_abs=0.6410731077194214, max_abs=2.5, mean_rel=0.1987018883228302, max_rel=60.60416030883789, norm_rel=0.02444390207529068, ref_abs_avg=26.559221267700195, test_abs_avg=26.577816009521484
production_forward2 grad[51] vs paper_forward: mean_abs=0.8087671995162964, max_abs=6.0, mean_rel=0.16266968846321106, max_rel=1387.822021484375, norm_rel=0.026042761281132698, ref_abs_avg=31.16128921508789, test_abs_avg=31.161582946777344
production_forward2 grad[52] vs paper_forward: mean_abs=0.7584042549133301, max_abs=4.5625, mean_rel=0.287411630153656, max_rel=2781.249755859375, norm_rel=0.024960314854979515, ref_abs_avg=30.51162338256836, test_abs_avg=30.510801315307617
production_forward2 grad[53] vs paper_forward: mean_abs=0.5897030830383301, max_abs=2.25, mean_rel=0.1619981974363327, max_rel=25.25955581665039, norm_rel=0.025897284969687462, ref_abs_avg=22.798656463623047, test_abs_avg=22.860084533691406
production_forward2 grad[54] vs paper_forward: mean_abs=0.7475987672805786, max_abs=5.5, mean_rel=0.16832280158996582, max_rel=1224.8902587890625, norm_rel=0.025797734037041664, ref_abs_avg=29.05321502685547, test_abs_avg=29.05185317993164
production_forward2 grad[55] vs paper_forward: mean_abs=0.6958122253417969, max_abs=4.28125, mean_rel=0.25045761466026306, max_rel=1812.4998779296875, norm_rel=0.024758880957961082, ref_abs_avg=28.195940017700195, test_abs_avg=28.195964813232422
production_forward2 grad[56] vs paper_forward: mean_abs=0.49421119689941406, max_abs=2.75, mean_rel=0.06639717519283295, max_rel=2.9021451473236084, norm_rel=0.022427866235375404, ref_abs_avg=23.53277587890625, test_abs_avg=23.518041610717773
production_forward2 grad[57] vs paper_forward: mean_abs=0.695372462272644, max_abs=5.5, mean_rel=0.16669374704360962, max_rel=1546.611328125, norm_rel=0.025293312966823578, ref_abs_avg=27.537967681884766, test_abs_avg=27.537700653076172
production_forward2 grad[58] vs paper_forward: mean_abs=0.6469305753707886, max_abs=4.0, mean_rel=0.22469159960746765, max_rel=1999.9998779296875, norm_rel=0.023841656744480133, ref_abs_avg=27.171287536621094, test_abs_avg=27.176084518432617
production_forward2 grad[59] vs paper_forward: mean_abs=0.5309523344039917, max_abs=2.1875, mean_rel=0.14940860867500305, max_rel=14.36355972290039, norm_rel=0.025747336447238922, ref_abs_avg=20.528030395507812, test_abs_avg=20.53323745727539
production_forward2 grad[60] vs paper_forward: mean_abs=0.650511622428894, max_abs=5.0, mean_rel=0.17456494271755219, max_rel=2266.59912109375, norm_rel=0.025028616189956665, ref_abs_avg=26.024024963378906, test_abs_avg=26.02186393737793
production_forward2 grad[61] vs paper_forward: mean_abs=0.6073215007781982, max_abs=4.0, mean_rel=0.24436995387077332, max_rel=2781.249755859375, norm_rel=0.023583099246025085, ref_abs_avg=25.744098663330078, test_abs_avg=25.748802185058594
production_forward2 grad[62] vs paper_forward: mean_abs=0.47304463386535645, max_abs=1.71875, mean_rel=0.09327293187379837, max_rel=2.652101755142212, norm_rel=0.022353293374180794, ref_abs_avg=20.175086975097656, test_abs_avg=20.179363250732422
production_forward2 grad[63] vs paper_forward: mean_abs=0.6105145215988159, max_abs=5.0, mean_rel=0.15618832409381866, max_rel=657.7186279296875, norm_rel=0.02455207146704197, ref_abs_avg=24.879642486572266, test_abs_avg=24.879802703857422
production_forward2 grad[64] vs paper_forward: mean_abs=0.5717983245849609, max_abs=4.0, mean_rel=0.30741894245147705, max_rel=2624.999755859375, norm_rel=0.023094700649380684, ref_abs_avg=24.78605842590332, test_abs_avg=24.798099517822266
production_forward2 grad[65] vs paper_forward: mean_abs=0.44635486602783203, max_abs=1.78125, mean_rel=0.08934175968170166, max_rel=9.086952209472656, norm_rel=0.02233235165476799, ref_abs_avg=19.897676467895508, test_abs_avg=19.89260482788086
production_forward2 grad[66] vs paper_forward: mean_abs=0.5778524875640869, max_abs=4.5, mean_rel=0.15431441366672516, max_rel=903.2901611328125, norm_rel=0.023862503468990326, ref_abs_avg=24.233898162841797, test_abs_avg=24.233306884765625
production_forward2 grad[67] vs paper_forward: mean_abs=0.5389134883880615, max_abs=4.0, mean_rel=0.21711021661758423, max_rel=1749.9998779296875, norm_rel=0.02215246856212616, ref_abs_avg=24.287477493286133, test_abs_avg=24.281997680664062
production_forward2 grad[68] vs paper_forward: mean_abs=0.4301927089691162, max_abs=1.75, mean_rel=0.0693010464310646, max_rel=1.370891809463501, norm_rel=0.02228698693215847, ref_abs_avg=20.044126510620117, test_abs_avg=20.02105712890625
production_forward2 grad[69] vs paper_forward: mean_abs=0.5491492748260498, max_abs=5.0, mean_rel=0.15076929330825806, max_rel=1217.403564453125, norm_rel=0.023866085335612297, ref_abs_avg=23.04425048828125, test_abs_avg=23.045671463012695
production_forward2 grad[70] vs paper_forward: mean_abs=0.5088586807250977, max_abs=3.515625, mean_rel=0.20748254656791687, max_rel=2187.5, norm_rel=0.02178054116666317, ref_abs_avg=23.329242706298828, test_abs_avg=23.322521209716797
production_forward2 grad[71] vs paper_forward: mean_abs=0.38674163818359375, max_abs=1.5625, mean_rel=0.07088451087474823, max_rel=2.381079912185669, norm_rel=0.02067910134792328, ref_abs_avg=18.363231658935547, test_abs_avg=18.337921142578125
production_forward2 grad[72] vs paper_forward: mean_abs=0.5284834504127502, max_abs=4.59375, mean_rel=0.15216463804244995, max_rel=772.109130859375, norm_rel=0.0236225388944149, ref_abs_avg=22.376535415649414, test_abs_avg=22.377405166625977
production_forward2 grad[73] vs paper_forward: mean_abs=0.48607558012008667, max_abs=3.25, mean_rel=0.20863758027553558, max_rel=1125.0, norm_rel=0.02193559892475605, ref_abs_avg=22.10736083984375, test_abs_avg=22.10739517211914
production_forward2 grad[74] vs paper_forward: mean_abs=0.4795114994049072, max_abs=2.125, mean_rel=0.10302495211362839, max_rel=6.183169364929199, norm_rel=0.024184564128518105, ref_abs_avg=19.88960838317871, test_abs_avg=19.83453369140625
production_forward2 grad[75] vs paper_forward: mean_abs=0.5925881862640381, max_abs=5.0, mean_rel=0.1663198471069336, max_rel=1071.5506591796875, norm_rel=0.024811869487166405, ref_abs_avg=23.941505432128906, test_abs_avg=23.943477630615234
production_forward2 grad[76] vs paper_forward: mean_abs=0.5480004549026489, max_abs=3.875, mean_rel=0.20217761397361755, max_rel=2031.2498779296875, norm_rel=0.02293343096971512, ref_abs_avg=24.00171661376953, test_abs_avg=24.001853942871094
production_forward2 grad[77] vs paper_forward: mean_abs=0.44599682092666626, max_abs=1.5, mean_rel=0.2331911027431488, max_rel=37.596534729003906, norm_rel=0.02331848442554474, ref_abs_avg=19.023082733154297, test_abs_avg=19.059837341308594
production_forward2 grad[78] vs paper_forward: mean_abs=0.5485919117927551, max_abs=4.75, mean_rel=0.14867068827152252, max_rel=1042.059326171875, norm_rel=0.024125689640641212, ref_abs_avg=22.803691864013672, test_abs_avg=22.80560302734375
production_forward2 grad[79] vs paper_forward: mean_abs=0.5063979625701904, max_abs=3.875, mean_rel=0.20780906081199646, max_rel=1960.9373779296875, norm_rel=0.022604554891586304, ref_abs_avg=22.47801399230957, test_abs_avg=22.479162216186523
production_forward2 grad[80] vs paper_forward: mean_abs=0.38921791315078735, max_abs=1.71875, mean_rel=0.1269828975200653, max_rel=13.241498947143555, norm_rel=0.02238897979259491, ref_abs_avg=17.525646209716797, test_abs_avg=17.5333194732666
production_forward2 grad[81] vs paper_forward: mean_abs=0.5088188648223877, max_abs=4.5, mean_rel=0.1480741947889328, max_rel=1055.12109375, norm_rel=0.023430870845913887, ref_abs_avg=21.79647445678711, test_abs_avg=21.797826766967773
production_forward2 grad[82] vs paper_forward: mean_abs=0.46769630908966064, max_abs=4.5, mean_rel=0.24414926767349243, max_rel=1374.9998779296875, norm_rel=0.022040950134396553, ref_abs_avg=21.297077178955078, test_abs_avg=21.29102325439453
production_forward2 grad[83] vs paper_forward: mean_abs=0.3956001102924347, max_abs=1.65625, mean_rel=0.10353507101535797, max_rel=7.078819274902344, norm_rel=0.022467637434601784, ref_abs_avg=16.812191009521484, test_abs_avg=16.802507400512695
production_forward2 grad[84] vs paper_forward: mean_abs=0.48074984550476074, max_abs=6.0, mean_rel=0.14668697118759155, max_rel=886.2649536132812, norm_rel=0.023302897810935974, ref_abs_avg=20.68723487854004, test_abs_avg=20.687923431396484
production_forward2 grad[85] vs paper_forward: mean_abs=0.4475710391998291, max_abs=3.5, mean_rel=0.20849770307540894, max_rel=1250.0, norm_rel=0.021783461794257164, ref_abs_avg=20.669872283935547, test_abs_avg=20.669322967529297
production_forward2 grad[86] vs paper_forward: mean_abs=0.35093560814857483, max_abs=1.6875, mean_rel=0.11820357292890549, max_rel=8.844561576843262, norm_rel=0.020616086199879646, ref_abs_avg=17.455947875976562, test_abs_avg=17.466861724853516
production_forward2 grad[87] vs paper_forward: mean_abs=0.4539855122566223, max_abs=4.5, mean_rel=0.13567176461219788, max_rel=690.0256958007812, norm_rel=0.022706283256411552, ref_abs_avg=20.121456146240234, test_abs_avg=20.121959686279297
production_forward2 grad[88] vs paper_forward: mean_abs=0.4213108420372009, max_abs=4.0, mean_rel=0.19050097465515137, max_rel=1937.4998779296875, norm_rel=0.0212636049836874, ref_abs_avg=19.941051483154297, test_abs_avg=19.93594741821289
production_forward2 grad[89] vs paper_forward: mean_abs=0.3385190963745117, max_abs=1.25, mean_rel=0.07812889665365219, max_rel=2.7496490478515625, norm_rel=0.021160144358873367, ref_abs_avg=15.970993995666504, test_abs_avg=15.97926139831543
production_forward2 grad[90] vs paper_forward: mean_abs=0.4242115020751953, max_abs=4.5, mean_rel=0.13855285942554474, max_rel=1392.9749755859375, norm_rel=0.02208978496491909, ref_abs_avg=19.37919807434082, test_abs_avg=19.380046844482422
production_forward2 grad[91] vs paper_forward: mean_abs=0.3882216811180115, max_abs=3.25, mean_rel=0.15997616946697235, max_rel=1062.5, norm_rel=0.02011406049132347, ref_abs_avg=19.40580177307129, test_abs_avg=19.39910316467285
production_forward2 grad[92] vs paper_forward: mean_abs=0.3307461738586426, max_abs=1.265625, mean_rel=0.0702226459980011, max_rel=3.5401113033294678, norm_rel=0.020061034709215164, ref_abs_avg=16.783517837524414, test_abs_avg=16.786245346069336
production_forward2 grad[93] vs paper_forward: mean_abs=0.40319448709487915, max_abs=4.8125, mean_rel=0.1319807767868042, max_rel=1119.5272216796875, norm_rel=0.021609239280223846, ref_abs_avg=18.885150909423828, test_abs_avg=18.886720657348633
production_forward2 grad[94] vs paper_forward: mean_abs=0.36537134647369385, max_abs=3.25, mean_rel=0.16524389386177063, max_rel=1046.875, norm_rel=0.019946876913309097, ref_abs_avg=18.510009765625, test_abs_avg=18.50087547302246
production_forward2 grad[95] vs paper_forward: mean_abs=0.3004646301269531, max_abs=1.25, mean_rel=0.07339179515838623, max_rel=4.7839250564575195, norm_rel=0.019372349604964256, ref_abs_avg=15.578657150268555, test_abs_avg=15.592720985412598
production_forward2 grad[96] vs paper_forward: mean_abs=0.37940460443496704, max_abs=5.0, mean_rel=0.13221071660518646, max_rel=1049.5596923828125, norm_rel=0.021261977031826973, ref_abs_avg=18.131301879882812, test_abs_avg=18.13292694091797
production_forward2 grad[97] vs paper_forward: mean_abs=0.3451353907585144, max_abs=3.25, mean_rel=0.16931459307670593, max_rel=1578.1248779296875, norm_rel=0.01988258585333824, ref_abs_avg=17.683982849121094, test_abs_avg=17.681114196777344
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  165.973 ms
torch_compile_phases_forward bwd-only: 132.638 ms
torch_compile_phases_forward peak allocated: fwd=12.781 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.078 GiB, fwd+bwd=17.330 GiB
production_forward fwd+bwd:  116.468 ms
production_forward bwd-only: 96.059 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.301 GiB, fwd+bwd=10.301 GiB
paper_forward fwd+bwd:  382.204 ms
paper_forward bwd-only: 302.037 ms
paper_forward peak allocated: fwd=29.706 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.721 GiB, fwd+bwd=32.471 GiB
production_forward2 fwd+bwd:  114.350 ms
production_forward2 bwd-only: 95.913 ms
production_forward2 peak allocated: fwd=3.071 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=3.301 GiB, fwd+bwd=11.301 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016410724492743611, max_abs=0.041015625
production_forward grad[0] vs paper_forward: mean_abs=0.008319206535816193, max_abs=0.578125, mean_rel=0.07162776589393616, max_rel=103.6353759765625, norm_rel=0.01968514360487461, ref_abs_avg=0.4604714512825012, test_abs_avg=0.46048444509506226
production_forward grad[1] vs paper_forward: mean_abs=7.259819030761719, max_abs=49.0, mean_rel=0.1910841166973114, max_rel=1082.11328125, norm_rel=0.020662058144807816, ref_abs_avg=317.243408203125, test_abs_avg=317.32293701171875
production_forward grad[2] vs paper_forward: mean_abs=1.1801280975341797, max_abs=5.0, mean_rel=0.0965026468038559, max_rel=5.137842178344727, norm_rel=0.02242247387766838, ref_abs_avg=52.96527862548828, test_abs_avg=52.88824462890625
production_forward grad[3] vs paper_forward: mean_abs=1.579169511795044, max_abs=11.0, mean_rel=0.18002283573150635, max_rel=4320.224609375, norm_rel=0.024408744648098946, ref_abs_avg=65.1314468383789, test_abs_avg=65.12832641601562
production_forward grad[4] vs paper_forward: mean_abs=1.462282657623291, max_abs=11.0, mean_rel=0.33800601959228516, max_rel=3062.499755859375, norm_rel=0.02283172868192196, ref_abs_avg=64.41191864013672, test_abs_avg=64.40867614746094
production_forward grad[5] vs paper_forward: mean_abs=1.1004072427749634, max_abs=4.0, mean_rel=0.08155535161495209, max_rel=3.7611658573150635, norm_rel=0.023828351870179176, ref_abs_avg=47.63768005371094, test_abs_avg=47.59792709350586
production_forward grad[6] vs paper_forward: mean_abs=1.4097466468811035, max_abs=10.0, mean_rel=0.15224063396453857, max_rel=1486.7911376953125, norm_rel=0.02413412369787693, ref_abs_avg=58.87807846069336, test_abs_avg=58.881221771240234
production_forward grad[7] vs paper_forward: mean_abs=1.2949414253234863, max_abs=8.125, mean_rel=0.3408883810043335, max_rel=3937.499755859375, norm_rel=0.02236642874777317, ref_abs_avg=58.20287322998047, test_abs_avg=58.20112609863281
production_forward grad[8] vs paper_forward: mean_abs=1.006307601928711, max_abs=4.53125, mean_rel=0.0731959268450737, max_rel=10.956306457519531, norm_rel=0.022914182394742966, ref_abs_avg=43.648765563964844, test_abs_avg=43.5791015625
production_forward grad[9] vs paper_forward: mean_abs=1.2777938842773438, max_abs=9.0, mean_rel=0.15939754247665405, max_rel=1497.72314453125, norm_rel=0.023979684337973595, ref_abs_avg=53.635597229003906, test_abs_avg=53.63672637939453
production_forward grad[10] vs paper_forward: mean_abs=1.1821503639221191, max_abs=7.34375, mean_rel=0.3740265369415283, max_rel=3624.999755859375, norm_rel=0.022362804040312767, ref_abs_avg=53.098655700683594, test_abs_avg=53.10496520996094
production_forward grad[11] vs paper_forward: mean_abs=0.9199647903442383, max_abs=3.5, mean_rel=0.1271977424621582, max_rel=28.247100830078125, norm_rel=0.02293217182159424, ref_abs_avg=41.271697998046875, test_abs_avg=41.26821517944336
production_forward grad[12] vs paper_forward: mean_abs=1.1837594509124756, max_abs=9.0, mean_rel=0.152363583445549, max_rel=1402.2169189453125, norm_rel=0.02381269820034504, ref_abs_avg=50.05084991455078, test_abs_avg=50.05415344238281
production_forward grad[13] vs paper_forward: mean_abs=1.0849943161010742, max_abs=7.0, mean_rel=0.29307910799980164, max_rel=3124.999755859375, norm_rel=0.022124582901597023, ref_abs_avg=49.304466247558594, test_abs_avg=49.31066131591797
production_forward grad[14] vs paper_forward: mean_abs=0.8813962936401367, max_abs=3.0, mean_rel=0.12473294138908386, max_rel=11.239185333251953, norm_rel=0.02430218830704689, ref_abs_avg=36.03278350830078, test_abs_avg=36.06724166870117
production_forward grad[15] vs paper_forward: mean_abs=1.1092122793197632, max_abs=8.0, mean_rel=0.1590358018875122, max_rel=2058.944091796875, norm_rel=0.02367069013416767, ref_abs_avg=47.13311004638672, test_abs_avg=47.13728332519531
production_forward grad[16] vs paper_forward: mean_abs=1.0196797847747803, max_abs=6.25, mean_rel=0.3300090432167053, max_rel=4812.5, norm_rel=0.022095760330557823, ref_abs_avg=46.403106689453125, test_abs_avg=46.40524673461914
production_forward grad[17] vs paper_forward: mean_abs=0.8176059722900391, max_abs=3.75, mean_rel=0.11726100742816925, max_rel=7.800105571746826, norm_rel=0.023043159395456314, ref_abs_avg=35.307735443115234, test_abs_avg=35.26032257080078
production_forward grad[18] vs paper_forward: mean_abs=1.0373969078063965, max_abs=7.0, mean_rel=0.16247686743736267, max_rel=1542.7904052734375, norm_rel=0.023618340492248535, ref_abs_avg=44.22636413574219, test_abs_avg=44.23026657104492
production_forward grad[19] vs paper_forward: mean_abs=0.95948326587677, max_abs=6.25, mean_rel=0.2978738248348236, max_rel=2624.999755859375, norm_rel=0.022123362869024277, ref_abs_avg=43.63001251220703, test_abs_avg=43.636112213134766
production_forward grad[20] vs paper_forward: mean_abs=0.7751684188842773, max_abs=2.875, mean_rel=0.09294810146093369, max_rel=7.359677314758301, norm_rel=0.021755076944828033, ref_abs_avg=35.964942932128906, test_abs_avg=35.960472106933594
production_forward grad[21] vs paper_forward: mean_abs=0.9850039482116699, max_abs=8.0, mean_rel=0.15506155788898468, max_rel=1026.2646484375, norm_rel=0.023495608940720558, ref_abs_avg=42.15679168701172, test_abs_avg=42.15882873535156
production_forward grad[22] vs paper_forward: mean_abs=0.9021918177604675, max_abs=6.0, mean_rel=0.28571611642837524, max_rel=2593.749755859375, norm_rel=0.021760201081633568, ref_abs_avg=41.65003967285156, test_abs_avg=41.65376663208008
production_forward grad[23] vs paper_forward: mean_abs=0.7249305248260498, max_abs=3.5, mean_rel=0.0726395845413208, max_rel=3.9770474433898926, norm_rel=0.02270941063761711, ref_abs_avg=33.03199005126953, test_abs_avg=33.025875091552734
production_forward grad[24] vs paper_forward: mean_abs=0.9442912340164185, max_abs=7.0, mean_rel=0.15563669800758362, max_rel=1809.12451171875, norm_rel=0.02327927015721798, ref_abs_avg=40.84514617919922, test_abs_avg=40.84474182128906
production_forward grad[25] vs paper_forward: mean_abs=0.8647905588150024, max_abs=5.0, mean_rel=0.24295209348201752, max_rel=2125.0, norm_rel=0.021637756377458572, ref_abs_avg=40.23675537109375, test_abs_avg=40.23945999145508
production_forward grad[26] vs paper_forward: mean_abs=0.833648681640625, max_abs=3.0, mean_rel=0.10035983473062515, max_rel=11.662466049194336, norm_rel=0.0225928146392107, ref_abs_avg=37.84254837036133, test_abs_avg=37.94187545776367
production_forward grad[27] vs paper_forward: mean_abs=1.0886896848678589, max_abs=7.125, mean_rel=0.16257639229297638, max_rel=1427.86181640625, norm_rel=0.025130687281489372, ref_abs_avg=43.57592010498047, test_abs_avg=43.580448150634766
production_forward grad[28] vs paper_forward: mean_abs=1.0089184045791626, max_abs=6.0, mean_rel=0.3192363977432251, max_rel=4625.0, norm_rel=0.023560568690299988, ref_abs_avg=43.09282684326172, test_abs_avg=43.09620666503906
production_forward grad[29] vs paper_forward: mean_abs=0.7506685256958008, max_abs=3.0, mean_rel=0.06617866456508636, max_rel=2.993988513946533, norm_rel=0.023227807134389877, ref_abs_avg=32.62506866455078, test_abs_avg=32.597137451171875
production_forward grad[30] vs paper_forward: mean_abs=1.0141537189483643, max_abs=9.0, mean_rel=0.16655148565769196, max_rel=1445.1107177734375, norm_rel=0.025544704869389534, ref_abs_avg=39.957679748535156, test_abs_avg=39.96090316772461
production_forward grad[31] vs paper_forward: mean_abs=0.9452959299087524, max_abs=6.0, mean_rel=0.2928418219089508, max_rel=2906.249755859375, norm_rel=0.023946251720190048, ref_abs_avg=39.68671417236328, test_abs_avg=39.69540023803711
production_forward grad[32] vs paper_forward: mean_abs=0.6989707946777344, max_abs=2.625, mean_rel=0.07704739272594452, max_rel=1.9211852550506592, norm_rel=0.02334577962756157, ref_abs_avg=30.462757110595703, test_abs_avg=30.418365478515625
production_forward grad[33] vs paper_forward: mean_abs=0.9377201199531555, max_abs=6.5, mean_rel=0.1656368225812912, max_rel=1275.8785400390625, norm_rel=0.025220131501555443, ref_abs_avg=37.35237121582031, test_abs_avg=37.35234069824219
production_forward grad[34] vs paper_forward: mean_abs=0.870753288269043, max_abs=5.5625, mean_rel=0.3484418988227844, max_rel=3374.999755859375, norm_rel=0.023979943245649338, ref_abs_avg=36.54576110839844, test_abs_avg=36.54651641845703
production_forward grad[35] vs paper_forward: mean_abs=0.6926326751708984, max_abs=2.625, mean_rel=0.10888966917991638, max_rel=6.490316867828369, norm_rel=0.02382475696504116, ref_abs_avg=28.69809341430664, test_abs_avg=28.74851417541504
production_forward grad[36] vs paper_forward: mean_abs=0.8724910020828247, max_abs=6.0, mean_rel=0.16444607079029083, max_rel=1593.6529541015625, norm_rel=0.025060610845685005, ref_abs_avg=34.9418830871582, test_abs_avg=34.94378662109375
production_forward grad[37] vs paper_forward: mean_abs=0.8102838397026062, max_abs=4.625, mean_rel=0.2768130302429199, max_rel=2968.749755859375, norm_rel=0.023419378325343132, ref_abs_avg=34.69255828857422, test_abs_avg=34.686622619628906
production_forward grad[38] vs paper_forward: mean_abs=0.665855884552002, max_abs=2.375, mean_rel=0.11141304671764374, max_rel=6.392958164215088, norm_rel=0.02444535307586193, ref_abs_avg=26.91352081298828, test_abs_avg=26.94097137451172
production_forward grad[39] vs paper_forward: mean_abs=0.8279244899749756, max_abs=7.0, mean_rel=0.15489766001701355, max_rel=1174.657470703125, norm_rel=0.02481679990887642, ref_abs_avg=33.481414794921875, test_abs_avg=33.48191452026367
production_forward grad[40] vs paper_forward: mean_abs=0.7655484080314636, max_abs=4.75, mean_rel=0.2847217321395874, max_rel=2406.25, norm_rel=0.023388396948575974, ref_abs_avg=32.806495666503906, test_abs_avg=32.80534362792969
production_forward grad[41] vs paper_forward: mean_abs=0.5828123092651367, max_abs=2.75, mean_rel=0.08674265444278717, max_rel=4.3054938316345215, norm_rel=0.022196931764483452, ref_abs_avg=27.325843811035156, test_abs_avg=27.264678955078125
production_forward grad[42] vs paper_forward: mean_abs=0.7883636951446533, max_abs=5.5, mean_rel=0.16026368737220764, max_rel=1635.072021484375, norm_rel=0.024580324068665504, ref_abs_avg=32.16605758666992, test_abs_avg=32.16596221923828
production_forward grad[43] vs paper_forward: mean_abs=0.7294220924377441, max_abs=4.75, mean_rel=0.285844624042511, max_rel=2640.624755859375, norm_rel=0.023003727197647095, ref_abs_avg=31.779399871826172, test_abs_avg=31.778789520263672
production_forward grad[44] vs paper_forward: mean_abs=0.6096258163452148, max_abs=3.0, mean_rel=0.1588992029428482, max_rel=34.944366455078125, norm_rel=0.02366635948419571, ref_abs_avg=25.236848831176758, test_abs_avg=25.253612518310547
production_forward grad[45] vs paper_forward: mean_abs=0.7500619292259216, max_abs=5.5, mean_rel=0.16133928298950195, max_rel=1244.8831787109375, norm_rel=0.024345794692635536, ref_abs_avg=30.897151947021484, test_abs_avg=30.898426055908203
production_forward grad[46] vs paper_forward: mean_abs=0.690048336982727, max_abs=4.375, mean_rel=0.2532954216003418, max_rel=2624.999755859375, norm_rel=0.022980205714702606, ref_abs_avg=30.12343406677246, test_abs_avg=30.12572479248047
production_forward grad[47] vs paper_forward: mean_abs=0.5496730804443359, max_abs=2.28125, mean_rel=0.08734630048274994, max_rel=5.466137886047363, norm_rel=0.02191532775759697, ref_abs_avg=25.2907772064209, test_abs_avg=25.26406478881836
production_forward grad[48] vs paper_forward: mean_abs=0.7194473743438721, max_abs=5.0, mean_rel=0.15575221180915833, max_rel=850.9185791015625, norm_rel=0.024025486782193184, ref_abs_avg=30.027250289916992, test_abs_avg=30.02720069885254
production_forward grad[49] vs paper_forward: mean_abs=0.6648123264312744, max_abs=4.5, mean_rel=0.2118571400642395, max_rel=1749.9998779296875, norm_rel=0.022608142346143723, ref_abs_avg=29.518268585205078, test_abs_avg=29.513710021972656
production_forward grad[50] vs paper_forward: mean_abs=0.6249294281005859, max_abs=2.640625, mean_rel=0.14798995852470398, max_rel=32.953487396240234, norm_rel=0.025028692558407784, ref_abs_avg=25.81292724609375, test_abs_avg=25.799909591674805
production_forward grad[51] vs paper_forward: mean_abs=0.7952984571456909, max_abs=6.5, mean_rel=0.1733742356300354, max_rel=1288.87890625, norm_rel=0.026033027097582817, ref_abs_avg=30.628665924072266, test_abs_avg=30.629901885986328
production_forward grad[52] vs paper_forward: mean_abs=0.739237368106842, max_abs=5.25, mean_rel=0.30165576934814453, max_rel=2874.999755859375, norm_rel=0.02442113123834133, ref_abs_avg=30.36155891418457, test_abs_avg=30.357763290405273
production_forward grad[53] vs paper_forward: mean_abs=0.5597400665283203, max_abs=2.0, mean_rel=0.11748513579368591, max_rel=8.575945854187012, norm_rel=0.024888888001441956, ref_abs_avg=22.6004638671875, test_abs_avg=22.574295043945312
production_forward grad[54] vs paper_forward: mean_abs=0.7118804454803467, max_abs=6.5, mean_rel=0.1640949845314026, max_rel=936.0765380859375, norm_rel=0.025553440675139427, ref_abs_avg=27.926969528198242, test_abs_avg=27.929052352905273
production_forward grad[55] vs paper_forward: mean_abs=0.6623624563217163, max_abs=5.0, mean_rel=0.2356031835079193, max_rel=2031.2498779296875, norm_rel=0.02408292330801487, ref_abs_avg=27.60123062133789, test_abs_avg=27.596328735351562
production_forward grad[56] vs paper_forward: mean_abs=0.5030132532119751, max_abs=2.0, mean_rel=0.1284375637769699, max_rel=15.84483528137207, norm_rel=0.024083543568849564, ref_abs_avg=20.98218536376953, test_abs_avg=20.942697525024414
production_forward grad[57] vs paper_forward: mean_abs=0.6576272249221802, max_abs=5.0, mean_rel=0.15735456347465515, max_rel=1265.287353515625, norm_rel=0.025028342381119728, ref_abs_avg=26.348255157470703, test_abs_avg=26.348039627075195
production_forward grad[58] vs paper_forward: mean_abs=0.6168581247329712, max_abs=4.0, mean_rel=0.2766149044036865, max_rel=2687.499755859375, norm_rel=0.023702384904026985, ref_abs_avg=26.083711624145508, test_abs_avg=26.082820892333984
production_forward grad[59] vs paper_forward: mean_abs=0.49085187911987305, max_abs=1.875, mean_rel=0.14443139731884003, max_rel=24.248756408691406, norm_rel=0.023226676508784294, ref_abs_avg=21.204029083251953, test_abs_avg=21.234394073486328
production_forward grad[60] vs paper_forward: mean_abs=0.6299908757209778, max_abs=5.5, mean_rel=0.16554692387580872, max_rel=1113.51904296875, norm_rel=0.024399230256676674, ref_abs_avg=25.83926010131836, test_abs_avg=25.840587615966797
production_forward grad[61] vs paper_forward: mean_abs=0.5802606344223022, max_abs=3.8125, mean_rel=0.2133329212665558, max_rel=1656.2498779296875, norm_rel=0.02321852557361126, ref_abs_avg=25.01056480407715, test_abs_avg=25.01011848449707
production_forward grad[62] vs paper_forward: mean_abs=0.4615206718444824, max_abs=1.6328125, mean_rel=0.0996444821357727, max_rel=4.695206642150879, norm_rel=0.023654546588659286, ref_abs_avg=19.545785903930664, test_abs_avg=19.553518295288086
production_forward grad[63] vs paper_forward: mean_abs=0.5897378921508789, max_abs=4.125, mean_rel=0.14793577790260315, max_rel=919.9378662109375, norm_rel=0.02438078261911869, ref_abs_avg=24.251201629638672, test_abs_avg=24.248641967773438
production_forward grad[64] vs paper_forward: mean_abs=0.5566749572753906, max_abs=4.9375, mean_rel=0.21166102588176727, max_rel=1796.8748779296875, norm_rel=0.023271579295396805, ref_abs_avg=23.96851348876953, test_abs_avg=23.963232040405273
production_forward grad[65] vs paper_forward: mean_abs=0.45912957191467285, max_abs=1.875, mean_rel=0.1358402669429779, max_rel=24.362417221069336, norm_rel=0.02337219938635826, ref_abs_avg=19.310758590698242, test_abs_avg=19.366928100585938
production_forward grad[66] vs paper_forward: mean_abs=0.5613958835601807, max_abs=5.0, mean_rel=0.1404053121805191, max_rel=617.8242797851562, norm_rel=0.023730652406811714, ref_abs_avg=23.710424423217773, test_abs_avg=23.7113037109375
production_forward grad[67] vs paper_forward: mean_abs=0.5195797085762024, max_abs=4.0, mean_rel=0.24828507006168365, max_rel=2093.75, norm_rel=0.02209741249680519, ref_abs_avg=23.532163619995117, test_abs_avg=23.535694122314453
production_forward grad[68] vs paper_forward: mean_abs=0.4120074510574341, max_abs=1.5625, mean_rel=0.1003367230296135, max_rel=5.26877498626709, norm_rel=0.022845277562737465, ref_abs_avg=17.85465431213379, test_abs_avg=17.826175689697266
production_forward grad[69] vs paper_forward: mean_abs=0.5362250208854675, max_abs=5.0, mean_rel=0.14387249946594238, max_rel=844.5367431640625, norm_rel=0.023327354341745377, ref_abs_avg=23.00177001953125, test_abs_avg=23.001888275146484
production_forward grad[70] vs paper_forward: mean_abs=0.4929126501083374, max_abs=3.75, mean_rel=0.1890943944454193, max_rel=2718.749755859375, norm_rel=0.02164899930357933, ref_abs_avg=22.775409698486328, test_abs_avg=22.7747745513916
production_forward grad[71] vs paper_forward: mean_abs=0.40776968002319336, max_abs=1.67626953125, mean_rel=0.14710162580013275, max_rel=38.552242279052734, norm_rel=0.021678386256098747, ref_abs_avg=19.19518280029297, test_abs_avg=19.221961975097656
production_forward grad[72] vs paper_forward: mean_abs=0.5171982645988464, max_abs=4.5, mean_rel=0.15328682959079742, max_rel=1248.0328369140625, norm_rel=0.023356113582849503, ref_abs_avg=22.16229248046875, test_abs_avg=22.16176986694336
production_forward grad[73] vs paper_forward: mean_abs=0.47173652052879333, max_abs=3.375, mean_rel=0.21026596426963806, max_rel=1250.0, norm_rel=0.0213676355779171, ref_abs_avg=22.01636505126953, test_abs_avg=22.01759910583496
production_forward grad[74] vs paper_forward: mean_abs=0.44380688667297363, max_abs=2.125, mean_rel=0.07417276501655579, max_rel=2.5375804901123047, norm_rel=0.020819688215851784, ref_abs_avg=21.512020111083984, test_abs_avg=21.507007598876953
production_forward grad[75] vs paper_forward: mean_abs=0.5850083827972412, max_abs=5.5, mean_rel=0.15630434453487396, max_rel=728.6790771484375, norm_rel=0.024433184415102005, ref_abs_avg=23.97163200378418, test_abs_avg=23.97169303894043
production_forward grad[76] vs paper_forward: mean_abs=0.5400369763374329, max_abs=4.0, mean_rel=0.23764434456825256, max_rel=1562.4998779296875, norm_rel=0.022719815373420715, ref_abs_avg=23.762033462524414, test_abs_avg=23.760082244873047
production_forward grad[77] vs paper_forward: mean_abs=0.4327486753463745, max_abs=1.75, mean_rel=0.3898884057998657, max_rel=96.76748657226562, norm_rel=0.02188473753631115, ref_abs_avg=19.61001968383789, test_abs_avg=19.628414154052734
production_forward grad[78] vs paper_forward: mean_abs=0.5396877527236938, max_abs=5.5, mean_rel=0.1538805067539215, max_rel=1258.596923828125, norm_rel=0.023539992049336433, ref_abs_avg=22.980398178100586, test_abs_avg=22.978290557861328
production_forward grad[79] vs paper_forward: mean_abs=0.5014556646347046, max_abs=5.0, mean_rel=0.23839357495307922, max_rel=2250.0, norm_rel=0.022333920001983643, ref_abs_avg=22.502626419067383, test_abs_avg=22.504154205322266
production_forward grad[80] vs paper_forward: mean_abs=0.4132204055786133, max_abs=1.5, mean_rel=0.13621072471141815, max_rel=22.81290626525879, norm_rel=0.021749792620539665, ref_abs_avg=18.391765594482422, test_abs_avg=18.35148811340332
production_forward grad[81] vs paper_forward: mean_abs=0.5098507404327393, max_abs=5.125, mean_rel=0.14442500472068787, max_rel=741.1796264648438, norm_rel=0.0232958123087883, ref_abs_avg=21.93335723876953, test_abs_avg=21.932209014892578
production_forward grad[82] vs paper_forward: mean_abs=0.46933138370513916, max_abs=4.625, mean_rel=0.19571134448051453, max_rel=1531.2498779296875, norm_rel=0.02170238085091114, ref_abs_avg=21.655445098876953, test_abs_avg=21.651138305664062
production_forward grad[83] vs paper_forward: mean_abs=0.3721647262573242, max_abs=1.5, mean_rel=0.11163748800754547, max_rel=12.271679878234863, norm_rel=0.021077988669276237, ref_abs_avg=18.091938018798828, test_abs_avg=18.095794677734375
production_forward grad[84] vs paper_forward: mean_abs=0.47209128737449646, max_abs=4.0, mean_rel=0.14598335325717926, max_rel=711.8426513671875, norm_rel=0.022867344319820404, ref_abs_avg=20.74256706237793, test_abs_avg=20.743423461914062
production_forward grad[85] vs paper_forward: mean_abs=0.4472312331199646, max_abs=4.25, mean_rel=0.22854052484035492, max_rel=1312.4998779296875, norm_rel=0.021770060062408447, ref_abs_avg=20.595890045166016, test_abs_avg=20.599576950073242
production_forward grad[86] vs paper_forward: mean_abs=0.34944963455200195, max_abs=1.5625, mean_rel=0.13857115805149078, max_rel=20.52721405029297, norm_rel=0.021760523319244385, ref_abs_avg=16.10385513305664, test_abs_avg=16.120059967041016
production_forward grad[87] vs paper_forward: mean_abs=0.44546782970428467, max_abs=4.5, mean_rel=0.13183726370334625, max_rel=623.6146850585938, norm_rel=0.022224698215723038, ref_abs_avg=20.154205322265625, test_abs_avg=20.154510498046875
production_forward grad[88] vs paper_forward: mean_abs=0.4050893187522888, max_abs=4.0, mean_rel=0.17546914517879486, max_rel=1687.4998779296875, norm_rel=0.020766213536262512, ref_abs_avg=19.66566276550293, test_abs_avg=19.664653778076172
production_forward grad[89] vs paper_forward: mean_abs=0.3320337235927582, max_abs=1.5, mean_rel=0.31250688433647156, max_rel=127.10970306396484, norm_rel=0.020256411284208298, ref_abs_avg=16.784847259521484, test_abs_avg=16.768882751464844
production_forward grad[90] vs paper_forward: mean_abs=0.4185881018638611, max_abs=4.5, mean_rel=0.1285315752029419, max_rel=639.8353881835938, norm_rel=0.02169889770448208, ref_abs_avg=19.47226333618164, test_abs_avg=19.471574783325195
production_forward grad[91] vs paper_forward: mean_abs=0.3768962025642395, max_abs=3.5, mean_rel=0.1979888677597046, max_rel=1062.5, norm_rel=0.01998995617032051, ref_abs_avg=18.91483497619629, test_abs_avg=18.913963317871094
production_forward grad[92] vs paper_forward: mean_abs=0.31238794326782227, max_abs=1.25, mean_rel=0.0611598938703537, max_rel=4.468632698059082, norm_rel=0.01996074989438057, ref_abs_avg=15.877608299255371, test_abs_avg=15.870391845703125
production_forward grad[93] vs paper_forward: mean_abs=0.4019070565700531, max_abs=6.0, mean_rel=0.12926772236824036, max_rel=548.0787353515625, norm_rel=0.021402398124337196, ref_abs_avg=19.000743865966797, test_abs_avg=18.999818801879883
production_forward grad[94] vs paper_forward: mean_abs=0.3614615797996521, max_abs=4.0, mean_rel=0.16877591609954834, max_rel=1453.1248779296875, norm_rel=0.019693491980433464, ref_abs_avg=18.61383819580078, test_abs_avg=18.621437072753906
production_forward grad[95] vs paper_forward: mean_abs=0.27647167444229126, max_abs=1.0625, mean_rel=0.09872174263000488, max_rel=6.252868175506592, norm_rel=0.01821029558777809, ref_abs_avg=14.887700080871582, test_abs_avg=14.884265899658203
production_forward grad[96] vs paper_forward: mean_abs=0.3716685175895691, max_abs=4.75, mean_rel=0.11922945827245712, max_rel=574.2277221679688, norm_rel=0.020969359204173088, ref_abs_avg=18.022960662841797, test_abs_avg=18.021589279174805
production_forward grad[97] vs paper_forward: mean_abs=0.3431662321090698, max_abs=4.25, mean_rel=0.15938502550125122, max_rel=999.9999389648438, norm_rel=0.01963888294994831, ref_abs_avg=17.895389556884766, test_abs_avg=17.887432098388672
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016434473218396306, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008667205460369587, max_abs=0.578125, mean_rel=0.07426276803016663, max_rel=100.54608154296875, norm_rel=0.02039508707821369, ref_abs_avg=0.4604714512825012, test_abs_avg=0.46047288179397583
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.436827659606934, max_abs=56.0, mean_rel=0.14597275853157043, max_rel=426.98284912109375, norm_rel=0.02106988988816738, ref_abs_avg=317.243408203125, test_abs_avg=317.3724060058594
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.2610207796096802, max_abs=5.0, mean_rel=0.08893130719661713, max_rel=5.993062496185303, norm_rel=0.023964202031493187, ref_abs_avg=52.96527862548828, test_abs_avg=52.926719665527344
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.626889944076538, max_abs=10.5, mean_rel=0.18258053064346313, max_rel=4149.376953125, norm_rel=0.02513282373547554, ref_abs_avg=65.1314468383789, test_abs_avg=65.13162994384766
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5118277072906494, max_abs=10.5, mean_rel=0.349953293800354, max_rel=4187.5, norm_rel=0.023626893758773804, ref_abs_avg=64.41191864013672, test_abs_avg=64.41474914550781
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0963411331176758, max_abs=4.0, mean_rel=0.08641678839921951, max_rel=5.222023010253906, norm_rel=0.023564623668789864, ref_abs_avg=47.63768005371094, test_abs_avg=47.620399475097656
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4538321495056152, max_abs=10.0, mean_rel=0.1585974395275116, max_rel=1372.8935546875, norm_rel=0.024854611605405807, ref_abs_avg=58.87807846069336, test_abs_avg=58.88027572631836
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.342771291732788, max_abs=8.875, mean_rel=0.36731940507888794, max_rel=4625.0, norm_rel=0.023188510909676552, ref_abs_avg=58.20287322998047, test_abs_avg=58.20238494873047
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9531993865966797, max_abs=3.8125, mean_rel=0.06811980903148651, max_rel=9.824358940124512, norm_rel=0.022285563871264458, ref_abs_avg=43.648765563964844, test_abs_avg=43.59288787841797
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3168076276779175, max_abs=10.0, mean_rel=0.16950580477714539, max_rel=1915.493408203125, norm_rel=0.024696195498108864, ref_abs_avg=53.635597229003906, test_abs_avg=53.63593292236328
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2214157581329346, max_abs=7.75, mean_rel=0.4022146165370941, max_rel=3421.874755859375, norm_rel=0.023109761998057365, ref_abs_avg=53.098655700683594, test_abs_avg=53.10186004638672
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9801936149597168, max_abs=4.0, mean_rel=0.1553276926279068, max_rel=29.471698760986328, norm_rel=0.02417615056037903, ref_abs_avg=41.271697998046875, test_abs_avg=41.29200744628906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2166762351989746, max_abs=10.0, mean_rel=0.15807311236858368, max_rel=1389.3260498046875, norm_rel=0.0244681928306818, ref_abs_avg=50.05084991455078, test_abs_avg=50.053810119628906
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1230287551879883, max_abs=7.0, mean_rel=0.29418134689331055, max_rel=2874.999755859375, norm_rel=0.022874215617775917, ref_abs_avg=49.304466247558594, test_abs_avg=49.31207275390625
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.9279270172119141, max_abs=3.5, mean_rel=0.11313538253307343, max_rel=8.22746753692627, norm_rel=0.025499334558844566, ref_abs_avg=36.03278350830078, test_abs_avg=36.06522750854492
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1377983093261719, max_abs=9.0, mean_rel=0.1675637662410736, max_rel=1991.662841796875, norm_rel=0.02427014708518982, ref_abs_avg=47.13311004638672, test_abs_avg=47.13652420043945
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0494821071624756, max_abs=6.25, mean_rel=0.348493754863739, max_rel=4875.0, norm_rel=0.02273453213274479, ref_abs_avg=46.403106689453125, test_abs_avg=46.40647888183594
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8522639274597168, max_abs=4.0, mean_rel=0.13800200819969177, max_rel=9.717081069946289, norm_rel=0.024531343951821327, ref_abs_avg=35.307735443115234, test_abs_avg=35.24386215209961
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0644534826278687, max_abs=7.0, mean_rel=0.16134202480316162, max_rel=1548.599365234375, norm_rel=0.024229461327195168, ref_abs_avg=44.22636413574219, test_abs_avg=44.2304573059082
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9872797131538391, max_abs=6.0, mean_rel=0.3150966763496399, max_rel=3187.499755859375, norm_rel=0.02275003492832184, ref_abs_avg=43.63001251220703, test_abs_avg=43.63811492919922
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7759943008422852, max_abs=3.25, mean_rel=0.09477177262306213, max_rel=9.649941444396973, norm_rel=0.02215825393795967, ref_abs_avg=35.964942932128906, test_abs_avg=35.96131896972656
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0077729225158691, max_abs=7.0, mean_rel=0.16143426299095154, max_rel=1416.5057373046875, norm_rel=0.024025484919548035, ref_abs_avg=42.15679168701172, test_abs_avg=42.15850830078125
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.927794873714447, max_abs=5.75, mean_rel=0.28759896755218506, max_rel=3093.749755859375, norm_rel=0.02236546203494072, ref_abs_avg=41.65003967285156, test_abs_avg=41.65608215332031
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7280406951904297, max_abs=3.0, mean_rel=0.07268673181533813, max_rel=6.734079837799072, norm_rel=0.023200877010822296, ref_abs_avg=33.03199005126953, test_abs_avg=33.05297088623047
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9653331637382507, max_abs=6.0, mean_rel=0.16051200032234192, max_rel=943.37646484375, norm_rel=0.02377335913479328, ref_abs_avg=40.84514617919922, test_abs_avg=40.843170166015625
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8881679773330688, max_abs=5.5, mean_rel=0.2613218128681183, max_rel=2343.75, norm_rel=0.022219214588403702, ref_abs_avg=40.23675537109375, test_abs_avg=40.23882293701172
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.9019889831542969, max_abs=3.625, mean_rel=0.09021218121051788, max_rel=12.812962532043457, norm_rel=0.024044465273618698, ref_abs_avg=37.84254837036133, test_abs_avg=37.94056701660156
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1160030364990234, max_abs=7.75, mean_rel=0.16218946874141693, max_rel=1150.4249267578125, norm_rel=0.025744184851646423, ref_abs_avg=43.57592010498047, test_abs_avg=43.57781219482422
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0364923477172852, max_abs=6.0, mean_rel=0.3328511118888855, max_rel=4625.0, norm_rel=0.024181334301829338, ref_abs_avg=43.09282684326172, test_abs_avg=43.098899841308594
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7739791870117188, max_abs=3.0, mean_rel=0.0600564144551754, max_rel=1.5409717559814453, norm_rel=0.02417406439781189, ref_abs_avg=32.62506866455078, test_abs_avg=32.57450866699219
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.035231351852417, max_abs=7.0, mean_rel=0.16894671320915222, max_rel=1605.66259765625, norm_rel=0.026070015504956245, ref_abs_avg=39.957679748535156, test_abs_avg=39.95756530761719
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9672284126281738, max_abs=5.75, mean_rel=0.301259309053421, max_rel=5000.0, norm_rel=0.024476932361721992, ref_abs_avg=39.68671417236328, test_abs_avg=39.69444274902344
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7511997222900391, max_abs=2.5, mean_rel=0.09338878095149994, max_rel=2.436062812805176, norm_rel=0.02457432635128498, ref_abs_avg=30.462757110595703, test_abs_avg=30.417078018188477
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.956804633140564, max_abs=7.0, mean_rel=0.17249608039855957, max_rel=1426.7391357421875, norm_rel=0.025729168206453323, ref_abs_avg=37.35237121582031, test_abs_avg=37.35090255737305
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8929529786109924, max_abs=6.0, mean_rel=0.3411325514316559, max_rel=3374.999755859375, norm_rel=0.02454955130815506, ref_abs_avg=36.54576110839844, test_abs_avg=36.54315185546875
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6953091621398926, max_abs=3.125, mean_rel=0.10565438866615295, max_rel=6.677425384521484, norm_rel=0.02408195473253727, ref_abs_avg=28.69809341430664, test_abs_avg=28.738685607910156
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8879709243774414, max_abs=7.0, mean_rel=0.16497737169265747, max_rel=1504.2918701171875, norm_rel=0.02549043297767639, ref_abs_avg=34.9418830871582, test_abs_avg=34.94273376464844
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8274557590484619, max_abs=5.0, mean_rel=0.2974556088447571, max_rel=2375.0, norm_rel=0.023917188867926598, ref_abs_avg=34.69255828857422, test_abs_avg=34.689361572265625
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6689717769622803, max_abs=2.4375, mean_rel=0.12007132172584534, max_rel=7.163968086242676, norm_rel=0.024882981553673744, ref_abs_avg=26.91352081298828, test_abs_avg=26.95755386352539
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8429301977157593, max_abs=7.0, mean_rel=0.16374285519123077, max_rel=1381.324462890625, norm_rel=0.025245927274227142, ref_abs_avg=33.481414794921875, test_abs_avg=33.480804443359375
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7819778323173523, max_abs=5.0, mean_rel=0.3135055601596832, max_rel=3124.999755859375, norm_rel=0.023875035345554352, ref_abs_avg=32.806495666503906, test_abs_avg=32.80499267578125
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6179523468017578, max_abs=2.5, mean_rel=0.08998185396194458, max_rel=8.02319049835205, norm_rel=0.023453176021575928, ref_abs_avg=27.325843811035156, test_abs_avg=27.274242401123047
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8010449409484863, max_abs=5.046875, mean_rel=0.1649438887834549, max_rel=1502.804931640625, norm_rel=0.024958407506346703, ref_abs_avg=32.16605758666992, test_abs_avg=32.165283203125
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7409665584564209, max_abs=5.625, mean_rel=0.26332175731658936, max_rel=2312.5, norm_rel=0.023354509845376015, ref_abs_avg=31.779399871826172, test_abs_avg=31.77904510498047
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5999927520751953, max_abs=3.0, mean_rel=0.15713459253311157, max_rel=30.308298110961914, norm_rel=0.023527927696704865, ref_abs_avg=25.236848831176758, test_abs_avg=25.25155258178711
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.761877179145813, max_abs=5.5, mean_rel=0.16139660775661469, max_rel=1570.9105224609375, norm_rel=0.024715391919016838, ref_abs_avg=30.897151947021484, test_abs_avg=30.89727210998535
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7042502164840698, max_abs=4.5, mean_rel=0.26270174980163574, max_rel=2624.999755859375, norm_rel=0.02342296950519085, ref_abs_avg=30.12343406677246, test_abs_avg=30.1268253326416
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5525093078613281, max_abs=2.0, mean_rel=0.08625439554452896, max_rel=4.799488067626953, norm_rel=0.021601740270853043, ref_abs_avg=25.2907772064209, test_abs_avg=25.23524284362793
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.728447675704956, max_abs=6.0, mean_rel=0.1621619462966919, max_rel=798.0866088867188, norm_rel=0.024316197261214256, ref_abs_avg=30.027250289916992, test_abs_avg=30.028350830078125
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6757742166519165, max_abs=4.75, mean_rel=0.2106168270111084, max_rel=1781.2498779296875, norm_rel=0.022977257147431374, ref_abs_avg=29.518268585205078, test_abs_avg=29.513917922973633
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6419568061828613, max_abs=3.0625, mean_rel=0.11001886427402496, max_rel=16.636564254760742, norm_rel=0.025722671300172806, ref_abs_avg=25.81292724609375, test_abs_avg=25.83509063720703
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8086420297622681, max_abs=6.0, mean_rel=0.17570295929908752, max_rel=1222.5423583984375, norm_rel=0.026462402194738388, ref_abs_avg=30.628665924072266, test_abs_avg=30.629013061523438
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7534902095794678, max_abs=5.25, mean_rel=0.3178660273551941, max_rel=3249.999755859375, norm_rel=0.024901125580072403, ref_abs_avg=30.36155891418457, test_abs_avg=30.36160659790039
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5805803537368774, max_abs=2.3359375, mean_rel=0.13481763005256653, max_rel=12.513423919677734, norm_rel=0.02562723495066166, ref_abs_avg=22.6004638671875, test_abs_avg=22.590499877929688
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7236955165863037, max_abs=5.0, mean_rel=0.17010647058486938, max_rel=861.064697265625, norm_rel=0.025958336889743805, ref_abs_avg=27.926969528198242, test_abs_avg=27.927814483642578
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6740095615386963, max_abs=5.0, mean_rel=0.25018754601478577, max_rel=2500.0, norm_rel=0.024504873901605606, ref_abs_avg=27.60123062133789, test_abs_avg=27.595809936523438
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.524887204170227, max_abs=1.81640625, mean_rel=0.14258050918579102, max_rel=18.050495147705078, norm_rel=0.02491905353963375, ref_abs_avg=20.98218536376953, test_abs_avg=20.935245513916016
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6675570011138916, max_abs=5.0, mean_rel=0.15710479021072388, max_rel=1285.6075439453125, norm_rel=0.025407202541828156, ref_abs_avg=26.348255157470703, test_abs_avg=26.347177505493164
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.624839186668396, max_abs=4.5, mean_rel=0.267328679561615, max_rel=2187.5, norm_rel=0.024001892656087875, ref_abs_avg=26.083711624145508, test_abs_avg=26.083778381347656
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.47949618101119995, max_abs=2.1171875, mean_rel=0.1446247100830078, max_rel=22.552404403686523, norm_rel=0.02307659201323986, ref_abs_avg=21.204029083251953, test_abs_avg=21.236780166625977
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.638615071773529, max_abs=4.5, mean_rel=0.16121724247932434, max_rel=984.888427734375, norm_rel=0.02470838837325573, ref_abs_avg=25.83926010131836, test_abs_avg=25.83966064453125
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5886152386665344, max_abs=4.0, mean_rel=0.22521814703941345, max_rel=1718.7498779296875, norm_rel=0.023560460656881332, ref_abs_avg=25.01056480407715, test_abs_avg=25.010875701904297
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.4542686343193054, max_abs=1.9375, mean_rel=0.10656225681304932, max_rel=7.832213878631592, norm_rel=0.023855041712522507, ref_abs_avg=19.545785903930664, test_abs_avg=19.540143966674805
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5972509980201721, max_abs=4.4375, mean_rel=0.15191608667373657, max_rel=1055.464111328125, norm_rel=0.024684058502316475, ref_abs_avg=24.251201629638672, test_abs_avg=24.24843978881836
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5605674982070923, max_abs=4.1875, mean_rel=0.22073300182819366, max_rel=1531.2498779296875, norm_rel=0.023393070325255394, ref_abs_avg=23.96851348876953, test_abs_avg=23.961719512939453
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4614284038543701, max_abs=1.75, mean_rel=0.12506620585918427, max_rel=11.971046447753906, norm_rel=0.023243125528097153, ref_abs_avg=19.310758590698242, test_abs_avg=19.367786407470703
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5683475732803345, max_abs=4.75, mean_rel=0.14354106783866882, max_rel=768.8571166992188, norm_rel=0.02399565279483795, ref_abs_avg=23.710424423217773, test_abs_avg=23.711702346801758
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5242446064949036, max_abs=3.75, mean_rel=0.2593093514442444, max_rel=1812.4998779296875, norm_rel=0.022270431742072105, ref_abs_avg=23.532163619995117, test_abs_avg=23.53726577758789
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43329763412475586, max_abs=1.8125, mean_rel=0.09345395117998123, max_rel=4.209083557128906, norm_rel=0.024332039058208466, ref_abs_avg=17.85465431213379, test_abs_avg=17.83889389038086
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.541487455368042, max_abs=4.5, mean_rel=0.14269231259822845, max_rel=601.8970947265625, norm_rel=0.023563425987958908, ref_abs_avg=23.00177001953125, test_abs_avg=23.001495361328125
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.4986264407634735, max_abs=4.0, mean_rel=0.18887189030647278, max_rel=3031.249755859375, norm_rel=0.021881869062781334, ref_abs_avg=22.775409698486328, test_abs_avg=22.77369499206543
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4061014652252197, max_abs=1.54345703125, mean_rel=0.12457117438316345, max_rel=35.497711181640625, norm_rel=0.02146676555275917, ref_abs_avg=19.19518280029297, test_abs_avg=19.229801177978516
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5218042731285095, max_abs=4.0, mean_rel=0.15241393446922302, max_rel=1172.396484375, norm_rel=0.02354559488594532, ref_abs_avg=22.16229248046875, test_abs_avg=22.161211013793945
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4745216965675354, max_abs=3.25, mean_rel=0.21060970425605774, max_rel=1406.2498779296875, norm_rel=0.02149594947695732, ref_abs_avg=22.01636505126953, test_abs_avg=22.01820945739746
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4496755599975586, max_abs=2.0, mean_rel=0.0748739242553711, max_rel=2.416743278503418, norm_rel=0.021083034574985504, ref_abs_avg=21.512020111083984, test_abs_avg=21.491230010986328
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5913629531860352, max_abs=4.5, mean_rel=0.15718305110931396, max_rel=1298.3927001953125, norm_rel=0.024721339344978333, ref_abs_avg=23.97163200378418, test_abs_avg=23.972705841064453
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5476731657981873, max_abs=4.0, mean_rel=0.23604433238506317, max_rel=1374.9998779296875, norm_rel=0.023045392706990242, ref_abs_avg=23.762033462524414, test_abs_avg=23.759929656982422
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4283072054386139, max_abs=1.53125, mean_rel=0.35039228200912476, max_rel=69.45417022705078, norm_rel=0.02189447544515133, ref_abs_avg=19.61001968383789, test_abs_avg=19.616039276123047
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5453721284866333, max_abs=4.5, mean_rel=0.1567855328321457, max_rel=1704.9373779296875, norm_rel=0.023767638951539993, ref_abs_avg=22.980398178100586, test_abs_avg=22.978900909423828
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5101264715194702, max_abs=4.0, mean_rel=0.24221576750278473, max_rel=2437.5, norm_rel=0.022698255255818367, ref_abs_avg=22.502626419067383, test_abs_avg=22.50489044189453
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.4117014408111572, max_abs=1.40625, mean_rel=0.1603708565235138, max_rel=33.351749420166016, norm_rel=0.02187327668070793, ref_abs_avg=18.391765594482422, test_abs_avg=18.363906860351562
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5143561959266663, max_abs=4.625, mean_rel=0.14813414216041565, max_rel=1075.052001953125, norm_rel=0.023502962663769722, ref_abs_avg=21.93335723876953, test_abs_avg=21.93204116821289
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.47362020611763, max_abs=5.75, mean_rel=0.1943175494670868, max_rel=1265.625, norm_rel=0.021902818232774734, ref_abs_avg=21.655445098876953, test_abs_avg=21.650239944458008
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.383406400680542, max_abs=1.5, mean_rel=0.1230970174074173, max_rel=11.738945007324219, norm_rel=0.021332034841179848, ref_abs_avg=18.091938018798828, test_abs_avg=18.084945678710938
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4760793447494507, max_abs=4.125, mean_rel=0.14900393784046173, max_rel=772.1013793945312, norm_rel=0.023046985268592834, ref_abs_avg=20.74256706237793, test_abs_avg=20.743350982666016
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4476642906665802, max_abs=4.0, mean_rel=0.23593369126319885, max_rel=1578.1248779296875, norm_rel=0.021802183240652084, ref_abs_avg=20.595890045166016, test_abs_avg=20.605499267578125
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3586080074310303, max_abs=1.75, mean_rel=0.16462820768356323, max_rel=28.620351791381836, norm_rel=0.022434137761592865, ref_abs_avg=16.10385513305664, test_abs_avg=16.112777709960938
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.44909918308258057, max_abs=4.75, mean_rel=0.1354646384716034, max_rel=734.030029296875, norm_rel=0.02240675874054432, ref_abs_avg=20.154205322265625, test_abs_avg=20.1544132232666
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4084208607673645, max_abs=4.0, mean_rel=0.18654081225395203, max_rel=1812.4998779296875, norm_rel=0.02095421962440014, ref_abs_avg=19.66566276550293, test_abs_avg=19.659679412841797
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.33124399185180664, max_abs=1.5, mean_rel=0.193566232919693, max_rel=67.72542572021484, norm_rel=0.020068945363163948, ref_abs_avg=16.784847259521484, test_abs_avg=16.766475677490234
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.42084574699401855, max_abs=5.0, mean_rel=0.1277519017457962, max_rel=578.7286987304688, norm_rel=0.021814586594700813, ref_abs_avg=19.47226333618164, test_abs_avg=19.471384048461914
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3761844038963318, max_abs=3.9375, mean_rel=0.1975533664226532, max_rel=1312.4998779296875, norm_rel=0.019886337220668793, ref_abs_avg=18.91483497619629, test_abs_avg=18.91879653930664
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3081238269805908, max_abs=1.4375, mean_rel=0.0513337105512619, max_rel=1.3407222032546997, norm_rel=0.020152324810624123, ref_abs_avg=15.877608299255371, test_abs_avg=15.874534606933594
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.403716504573822, max_abs=5.0, mean_rel=0.12783598899841309, max_rel=609.2587280273438, norm_rel=0.021507002413272858, ref_abs_avg=19.000743865966797, test_abs_avg=18.999691009521484
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.361874520778656, max_abs=4.0, mean_rel=0.16804763674736023, max_rel=1499.9998779296875, norm_rel=0.01970876008272171, ref_abs_avg=18.61383819580078, test_abs_avg=18.62027359008789
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2802252769470215, max_abs=1.1875, mean_rel=0.08937983214855194, max_rel=5.728381156921387, norm_rel=0.01849467307329178, ref_abs_avg=14.887700080871582, test_abs_avg=14.885221481323242
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.37304913997650146, max_abs=5.0, mean_rel=0.11906930059194565, max_rel=455.03997802734375, norm_rel=0.0210439283400774, ref_abs_avg=18.022960662841797, test_abs_avg=18.02284049987793
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.35069334506988525, max_abs=4.5, mean_rel=0.16175216436386108, max_rel=1187.5, norm_rel=0.020178571343421936, ref_abs_avg=17.895389556884766, test_abs_avg=17.890792846679688
production_forward2 vs paper_forward output: mean_abs=0.0016410724492743611, max_abs=0.041015625
production_forward2 grad[0] vs paper_forward: mean_abs=0.008319206535816193, max_abs=0.578125, mean_rel=0.07162776589393616, max_rel=103.6353759765625, norm_rel=0.01968514360487461, ref_abs_avg=0.4604714512825012, test_abs_avg=0.46048444509506226
production_forward2 grad[1] vs paper_forward: mean_abs=7.25969934463501, max_abs=49.0, mean_rel=0.19108334183692932, max_rel=1082.11328125, norm_rel=0.02066166140139103, ref_abs_avg=317.243408203125, test_abs_avg=317.32293701171875
production_forward2 grad[2] vs paper_forward: mean_abs=1.1801280975341797, max_abs=5.0, mean_rel=0.0965026468038559, max_rel=5.137842178344727, norm_rel=0.02242247387766838, ref_abs_avg=52.96527862548828, test_abs_avg=52.88824462890625
production_forward2 grad[3] vs paper_forward: mean_abs=1.579169511795044, max_abs=11.0, mean_rel=0.18002283573150635, max_rel=4320.224609375, norm_rel=0.024408744648098946, ref_abs_avg=65.1314468383789, test_abs_avg=65.12832641601562
production_forward2 grad[4] vs paper_forward: mean_abs=1.462282657623291, max_abs=11.0, mean_rel=0.33800601959228516, max_rel=3062.499755859375, norm_rel=0.02283172868192196, ref_abs_avg=64.41191864013672, test_abs_avg=64.40867614746094
production_forward2 grad[5] vs paper_forward: mean_abs=1.1004072427749634, max_abs=4.0, mean_rel=0.08155535161495209, max_rel=3.7611658573150635, norm_rel=0.023828351870179176, ref_abs_avg=47.63768005371094, test_abs_avg=47.59792709350586
production_forward2 grad[6] vs paper_forward: mean_abs=1.4097466468811035, max_abs=10.0, mean_rel=0.15224063396453857, max_rel=1486.7911376953125, norm_rel=0.02413412369787693, ref_abs_avg=58.87807846069336, test_abs_avg=58.881221771240234
production_forward2 grad[7] vs paper_forward: mean_abs=1.2949414253234863, max_abs=8.125, mean_rel=0.3408883810043335, max_rel=3937.499755859375, norm_rel=0.02236642874777317, ref_abs_avg=58.20287322998047, test_abs_avg=58.20112609863281
production_forward2 grad[8] vs paper_forward: mean_abs=1.006307601928711, max_abs=4.53125, mean_rel=0.0731959268450737, max_rel=10.956306457519531, norm_rel=0.022914182394742966, ref_abs_avg=43.648765563964844, test_abs_avg=43.5791015625
production_forward2 grad[9] vs paper_forward: mean_abs=1.2777938842773438, max_abs=9.0, mean_rel=0.15939754247665405, max_rel=1497.72314453125, norm_rel=0.023979684337973595, ref_abs_avg=53.635597229003906, test_abs_avg=53.63672637939453
production_forward2 grad[10] vs paper_forward: mean_abs=1.1821503639221191, max_abs=7.34375, mean_rel=0.3740265369415283, max_rel=3624.999755859375, norm_rel=0.022362804040312767, ref_abs_avg=53.098655700683594, test_abs_avg=53.10496520996094
production_forward2 grad[11] vs paper_forward: mean_abs=0.9199647903442383, max_abs=3.5, mean_rel=0.1271977424621582, max_rel=28.247100830078125, norm_rel=0.02293217182159424, ref_abs_avg=41.271697998046875, test_abs_avg=41.26821517944336
production_forward2 grad[12] vs paper_forward: mean_abs=1.1837594509124756, max_abs=9.0, mean_rel=0.152363583445549, max_rel=1402.2169189453125, norm_rel=0.02381269820034504, ref_abs_avg=50.05084991455078, test_abs_avg=50.05415344238281
production_forward2 grad[13] vs paper_forward: mean_abs=1.0849943161010742, max_abs=7.0, mean_rel=0.29307910799980164, max_rel=3124.999755859375, norm_rel=0.022124582901597023, ref_abs_avg=49.304466247558594, test_abs_avg=49.31066131591797
production_forward2 grad[14] vs paper_forward: mean_abs=0.8813962936401367, max_abs=3.0, mean_rel=0.12473294138908386, max_rel=11.239185333251953, norm_rel=0.02430218830704689, ref_abs_avg=36.03278350830078, test_abs_avg=36.06724166870117
production_forward2 grad[15] vs paper_forward: mean_abs=1.1092122793197632, max_abs=8.0, mean_rel=0.1590358018875122, max_rel=2058.944091796875, norm_rel=0.02367069013416767, ref_abs_avg=47.13311004638672, test_abs_avg=47.13728332519531
production_forward2 grad[16] vs paper_forward: mean_abs=1.0196797847747803, max_abs=6.25, mean_rel=0.3300090432167053, max_rel=4812.5, norm_rel=0.022095760330557823, ref_abs_avg=46.403106689453125, test_abs_avg=46.40524673461914
production_forward2 grad[17] vs paper_forward: mean_abs=0.8176059722900391, max_abs=3.75, mean_rel=0.11726100742816925, max_rel=7.800105571746826, norm_rel=0.023043159395456314, ref_abs_avg=35.307735443115234, test_abs_avg=35.26032257080078
production_forward2 grad[18] vs paper_forward: mean_abs=1.0373969078063965, max_abs=7.0, mean_rel=0.16247686743736267, max_rel=1542.7904052734375, norm_rel=0.023618340492248535, ref_abs_avg=44.22636413574219, test_abs_avg=44.23026657104492
production_forward2 grad[19] vs paper_forward: mean_abs=0.95948326587677, max_abs=6.25, mean_rel=0.2978738248348236, max_rel=2624.999755859375, norm_rel=0.022123362869024277, ref_abs_avg=43.63001251220703, test_abs_avg=43.636112213134766
production_forward2 grad[20] vs paper_forward: mean_abs=0.7751684188842773, max_abs=2.875, mean_rel=0.09294810146093369, max_rel=7.359677314758301, norm_rel=0.021755076944828033, ref_abs_avg=35.964942932128906, test_abs_avg=35.960472106933594
production_forward2 grad[21] vs paper_forward: mean_abs=0.9850039482116699, max_abs=8.0, mean_rel=0.15506155788898468, max_rel=1026.2646484375, norm_rel=0.023495608940720558, ref_abs_avg=42.15679168701172, test_abs_avg=42.15882873535156
production_forward2 grad[22] vs paper_forward: mean_abs=0.9021918177604675, max_abs=6.0, mean_rel=0.28571611642837524, max_rel=2593.749755859375, norm_rel=0.021760201081633568, ref_abs_avg=41.65003967285156, test_abs_avg=41.65376663208008
production_forward2 grad[23] vs paper_forward: mean_abs=0.7249305248260498, max_abs=3.5, mean_rel=0.0726395845413208, max_rel=3.9770474433898926, norm_rel=0.02270941063761711, ref_abs_avg=33.03199005126953, test_abs_avg=33.025875091552734
production_forward2 grad[24] vs paper_forward: mean_abs=0.9442912340164185, max_abs=7.0, mean_rel=0.15563669800758362, max_rel=1809.12451171875, norm_rel=0.02327927015721798, ref_abs_avg=40.84514617919922, test_abs_avg=40.84474182128906
production_forward2 grad[25] vs paper_forward: mean_abs=0.8647905588150024, max_abs=5.0, mean_rel=0.24295209348201752, max_rel=2125.0, norm_rel=0.021637756377458572, ref_abs_avg=40.23675537109375, test_abs_avg=40.23945999145508
production_forward2 grad[26] vs paper_forward: mean_abs=0.833648681640625, max_abs=3.0, mean_rel=0.10035983473062515, max_rel=11.662466049194336, norm_rel=0.0225928146392107, ref_abs_avg=37.84254837036133, test_abs_avg=37.94187545776367
production_forward2 grad[27] vs paper_forward: mean_abs=1.0886896848678589, max_abs=7.125, mean_rel=0.16257639229297638, max_rel=1427.86181640625, norm_rel=0.025130687281489372, ref_abs_avg=43.57592010498047, test_abs_avg=43.580448150634766
production_forward2 grad[28] vs paper_forward: mean_abs=1.0089184045791626, max_abs=6.0, mean_rel=0.3192363977432251, max_rel=4625.0, norm_rel=0.023560568690299988, ref_abs_avg=43.09282684326172, test_abs_avg=43.09620666503906
production_forward2 grad[29] vs paper_forward: mean_abs=0.7506685256958008, max_abs=3.0, mean_rel=0.06617866456508636, max_rel=2.993988513946533, norm_rel=0.023227807134389877, ref_abs_avg=32.62506866455078, test_abs_avg=32.597137451171875
production_forward2 grad[30] vs paper_forward: mean_abs=1.0141537189483643, max_abs=9.0, mean_rel=0.16655148565769196, max_rel=1445.1107177734375, norm_rel=0.025544704869389534, ref_abs_avg=39.957679748535156, test_abs_avg=39.96090316772461
production_forward2 grad[31] vs paper_forward: mean_abs=0.9452959299087524, max_abs=6.0, mean_rel=0.2928418219089508, max_rel=2906.249755859375, norm_rel=0.023946251720190048, ref_abs_avg=39.68671417236328, test_abs_avg=39.69540023803711
production_forward2 grad[32] vs paper_forward: mean_abs=0.6989707946777344, max_abs=2.625, mean_rel=0.07704739272594452, max_rel=1.9211852550506592, norm_rel=0.02334577962756157, ref_abs_avg=30.462757110595703, test_abs_avg=30.418365478515625
production_forward2 grad[33] vs paper_forward: mean_abs=0.9377201199531555, max_abs=6.5, mean_rel=0.1656368225812912, max_rel=1275.8785400390625, norm_rel=0.025220131501555443, ref_abs_avg=37.35237121582031, test_abs_avg=37.35234069824219
production_forward2 grad[34] vs paper_forward: mean_abs=0.870753288269043, max_abs=5.5625, mean_rel=0.3484418988227844, max_rel=3374.999755859375, norm_rel=0.023979943245649338, ref_abs_avg=36.54576110839844, test_abs_avg=36.54651641845703
production_forward2 grad[35] vs paper_forward: mean_abs=0.6926326751708984, max_abs=2.625, mean_rel=0.10888966917991638, max_rel=6.490316867828369, norm_rel=0.02382475696504116, ref_abs_avg=28.69809341430664, test_abs_avg=28.74851417541504
production_forward2 grad[36] vs paper_forward: mean_abs=0.8724910020828247, max_abs=6.0, mean_rel=0.16444607079029083, max_rel=1593.6529541015625, norm_rel=0.025060610845685005, ref_abs_avg=34.9418830871582, test_abs_avg=34.94378662109375
production_forward2 grad[37] vs paper_forward: mean_abs=0.8102838397026062, max_abs=4.625, mean_rel=0.2768130302429199, max_rel=2968.749755859375, norm_rel=0.023419378325343132, ref_abs_avg=34.69255828857422, test_abs_avg=34.686622619628906
production_forward2 grad[38] vs paper_forward: mean_abs=0.665855884552002, max_abs=2.375, mean_rel=0.11141304671764374, max_rel=6.392958164215088, norm_rel=0.02444535307586193, ref_abs_avg=26.91352081298828, test_abs_avg=26.94097137451172
production_forward2 grad[39] vs paper_forward: mean_abs=0.8279244899749756, max_abs=7.0, mean_rel=0.15489766001701355, max_rel=1174.657470703125, norm_rel=0.02481679990887642, ref_abs_avg=33.481414794921875, test_abs_avg=33.48191452026367
production_forward2 grad[40] vs paper_forward: mean_abs=0.7655484080314636, max_abs=4.75, mean_rel=0.2847217321395874, max_rel=2406.25, norm_rel=0.023388396948575974, ref_abs_avg=32.806495666503906, test_abs_avg=32.80534362792969
production_forward2 grad[41] vs paper_forward: mean_abs=0.5828123092651367, max_abs=2.75, mean_rel=0.08674265444278717, max_rel=4.3054938316345215, norm_rel=0.022196931764483452, ref_abs_avg=27.325843811035156, test_abs_avg=27.264678955078125
production_forward2 grad[42] vs paper_forward: mean_abs=0.7883636951446533, max_abs=5.5, mean_rel=0.16026368737220764, max_rel=1635.072021484375, norm_rel=0.024580324068665504, ref_abs_avg=32.16605758666992, test_abs_avg=32.16596221923828
production_forward2 grad[43] vs paper_forward: mean_abs=0.7294220924377441, max_abs=4.75, mean_rel=0.285844624042511, max_rel=2640.624755859375, norm_rel=0.023003727197647095, ref_abs_avg=31.779399871826172, test_abs_avg=31.778789520263672
production_forward2 grad[44] vs paper_forward: mean_abs=0.6096258163452148, max_abs=3.0, mean_rel=0.1588992029428482, max_rel=34.944366455078125, norm_rel=0.02366635948419571, ref_abs_avg=25.236848831176758, test_abs_avg=25.253612518310547
production_forward2 grad[45] vs paper_forward: mean_abs=0.7500619292259216, max_abs=5.5, mean_rel=0.16133928298950195, max_rel=1244.8831787109375, norm_rel=0.024345794692635536, ref_abs_avg=30.897151947021484, test_abs_avg=30.898426055908203
production_forward2 grad[46] vs paper_forward: mean_abs=0.690048336982727, max_abs=4.375, mean_rel=0.2532954216003418, max_rel=2624.999755859375, norm_rel=0.022980205714702606, ref_abs_avg=30.12343406677246, test_abs_avg=30.12572479248047
production_forward2 grad[47] vs paper_forward: mean_abs=0.5496730804443359, max_abs=2.28125, mean_rel=0.08734630048274994, max_rel=5.466137886047363, norm_rel=0.02191532775759697, ref_abs_avg=25.2907772064209, test_abs_avg=25.26406478881836
production_forward2 grad[48] vs paper_forward: mean_abs=0.7194473743438721, max_abs=5.0, mean_rel=0.15575221180915833, max_rel=850.9185791015625, norm_rel=0.024025486782193184, ref_abs_avg=30.027250289916992, test_abs_avg=30.02720069885254
production_forward2 grad[49] vs paper_forward: mean_abs=0.6648123264312744, max_abs=4.5, mean_rel=0.2118571400642395, max_rel=1749.9998779296875, norm_rel=0.022608142346143723, ref_abs_avg=29.518268585205078, test_abs_avg=29.513710021972656
production_forward2 grad[50] vs paper_forward: mean_abs=0.6249294281005859, max_abs=2.640625, mean_rel=0.14798995852470398, max_rel=32.953487396240234, norm_rel=0.025028692558407784, ref_abs_avg=25.81292724609375, test_abs_avg=25.799909591674805
production_forward2 grad[51] vs paper_forward: mean_abs=0.7952984571456909, max_abs=6.5, mean_rel=0.1733742356300354, max_rel=1288.87890625, norm_rel=0.026033027097582817, ref_abs_avg=30.628665924072266, test_abs_avg=30.629901885986328
production_forward2 grad[52] vs paper_forward: mean_abs=0.739237368106842, max_abs=5.25, mean_rel=0.30165576934814453, max_rel=2874.999755859375, norm_rel=0.02442113123834133, ref_abs_avg=30.36155891418457, test_abs_avg=30.357763290405273
production_forward2 grad[53] vs paper_forward: mean_abs=0.5597400665283203, max_abs=2.0, mean_rel=0.11748513579368591, max_rel=8.575945854187012, norm_rel=0.024888888001441956, ref_abs_avg=22.6004638671875, test_abs_avg=22.574295043945312
production_forward2 grad[54] vs paper_forward: mean_abs=0.7118804454803467, max_abs=6.5, mean_rel=0.1640949845314026, max_rel=936.0765380859375, norm_rel=0.025553440675139427, ref_abs_avg=27.926969528198242, test_abs_avg=27.929052352905273
production_forward2 grad[55] vs paper_forward: mean_abs=0.6623624563217163, max_abs=5.0, mean_rel=0.2356031835079193, max_rel=2031.2498779296875, norm_rel=0.02408292330801487, ref_abs_avg=27.60123062133789, test_abs_avg=27.596328735351562
production_forward2 grad[56] vs paper_forward: mean_abs=0.5030132532119751, max_abs=2.0, mean_rel=0.1284375637769699, max_rel=15.84483528137207, norm_rel=0.024083543568849564, ref_abs_avg=20.98218536376953, test_abs_avg=20.942697525024414
production_forward2 grad[57] vs paper_forward: mean_abs=0.6576272249221802, max_abs=5.0, mean_rel=0.15735456347465515, max_rel=1265.287353515625, norm_rel=0.025028342381119728, ref_abs_avg=26.348255157470703, test_abs_avg=26.348039627075195
production_forward2 grad[58] vs paper_forward: mean_abs=0.6168581247329712, max_abs=4.0, mean_rel=0.2766149044036865, max_rel=2687.499755859375, norm_rel=0.023702384904026985, ref_abs_avg=26.083711624145508, test_abs_avg=26.082820892333984
production_forward2 grad[59] vs paper_forward: mean_abs=0.49085187911987305, max_abs=1.875, mean_rel=0.14443139731884003, max_rel=24.248756408691406, norm_rel=0.023226676508784294, ref_abs_avg=21.204029083251953, test_abs_avg=21.234394073486328
production_forward2 grad[60] vs paper_forward: mean_abs=0.6299908757209778, max_abs=5.5, mean_rel=0.16554692387580872, max_rel=1113.51904296875, norm_rel=0.024399230256676674, ref_abs_avg=25.83926010131836, test_abs_avg=25.840587615966797
production_forward2 grad[61] vs paper_forward: mean_abs=0.5802606344223022, max_abs=3.8125, mean_rel=0.2133329212665558, max_rel=1656.2498779296875, norm_rel=0.02321852557361126, ref_abs_avg=25.01056480407715, test_abs_avg=25.01011848449707
production_forward2 grad[62] vs paper_forward: mean_abs=0.4615206718444824, max_abs=1.6328125, mean_rel=0.0996444821357727, max_rel=4.695206642150879, norm_rel=0.023654546588659286, ref_abs_avg=19.545785903930664, test_abs_avg=19.553518295288086
production_forward2 grad[63] vs paper_forward: mean_abs=0.5897378921508789, max_abs=4.125, mean_rel=0.14793577790260315, max_rel=919.9378662109375, norm_rel=0.02438078261911869, ref_abs_avg=24.251201629638672, test_abs_avg=24.248641967773438
production_forward2 grad[64] vs paper_forward: mean_abs=0.5566749572753906, max_abs=4.9375, mean_rel=0.21166102588176727, max_rel=1796.8748779296875, norm_rel=0.023271579295396805, ref_abs_avg=23.96851348876953, test_abs_avg=23.963232040405273
production_forward2 grad[65] vs paper_forward: mean_abs=0.45912957191467285, max_abs=1.875, mean_rel=0.1358402669429779, max_rel=24.362417221069336, norm_rel=0.02337219938635826, ref_abs_avg=19.310758590698242, test_abs_avg=19.366928100585938
production_forward2 grad[66] vs paper_forward: mean_abs=0.5613958835601807, max_abs=5.0, mean_rel=0.1404053121805191, max_rel=617.8242797851562, norm_rel=0.023730652406811714, ref_abs_avg=23.710424423217773, test_abs_avg=23.7113037109375
production_forward2 grad[67] vs paper_forward: mean_abs=0.5195797085762024, max_abs=4.0, mean_rel=0.24828507006168365, max_rel=2093.75, norm_rel=0.02209741249680519, ref_abs_avg=23.532163619995117, test_abs_avg=23.535694122314453
production_forward2 grad[68] vs paper_forward: mean_abs=0.4120074510574341, max_abs=1.5625, mean_rel=0.1003367230296135, max_rel=5.26877498626709, norm_rel=0.022845277562737465, ref_abs_avg=17.85465431213379, test_abs_avg=17.826175689697266
production_forward2 grad[69] vs paper_forward: mean_abs=0.5362250208854675, max_abs=5.0, mean_rel=0.14387249946594238, max_rel=844.5367431640625, norm_rel=0.023327354341745377, ref_abs_avg=23.00177001953125, test_abs_avg=23.001888275146484
production_forward2 grad[70] vs paper_forward: mean_abs=0.4929126501083374, max_abs=3.75, mean_rel=0.1890943944454193, max_rel=2718.749755859375, norm_rel=0.02164899930357933, ref_abs_avg=22.775409698486328, test_abs_avg=22.7747745513916
production_forward2 grad[71] vs paper_forward: mean_abs=0.40776968002319336, max_abs=1.67626953125, mean_rel=0.14710162580013275, max_rel=38.552242279052734, norm_rel=0.021678386256098747, ref_abs_avg=19.19518280029297, test_abs_avg=19.221961975097656
production_forward2 grad[72] vs paper_forward: mean_abs=0.5171982645988464, max_abs=4.5, mean_rel=0.15328682959079742, max_rel=1248.0328369140625, norm_rel=0.023356113582849503, ref_abs_avg=22.16229248046875, test_abs_avg=22.16176986694336
production_forward2 grad[73] vs paper_forward: mean_abs=0.47173652052879333, max_abs=3.375, mean_rel=0.21026596426963806, max_rel=1250.0, norm_rel=0.0213676355779171, ref_abs_avg=22.01636505126953, test_abs_avg=22.01759910583496
production_forward2 grad[74] vs paper_forward: mean_abs=0.44380688667297363, max_abs=2.125, mean_rel=0.07417276501655579, max_rel=2.5375804901123047, norm_rel=0.020819688215851784, ref_abs_avg=21.512020111083984, test_abs_avg=21.507007598876953
production_forward2 grad[75] vs paper_forward: mean_abs=0.5850083827972412, max_abs=5.5, mean_rel=0.15630434453487396, max_rel=728.6790771484375, norm_rel=0.024433184415102005, ref_abs_avg=23.97163200378418, test_abs_avg=23.97169303894043
production_forward2 grad[76] vs paper_forward: mean_abs=0.5400369763374329, max_abs=4.0, mean_rel=0.23764434456825256, max_rel=1562.4998779296875, norm_rel=0.022719815373420715, ref_abs_avg=23.762033462524414, test_abs_avg=23.760082244873047
production_forward2 grad[77] vs paper_forward: mean_abs=0.4327486753463745, max_abs=1.75, mean_rel=0.3898884057998657, max_rel=96.76748657226562, norm_rel=0.02188473753631115, ref_abs_avg=19.61001968383789, test_abs_avg=19.628414154052734
production_forward2 grad[78] vs paper_forward: mean_abs=0.5396877527236938, max_abs=5.5, mean_rel=0.1538805067539215, max_rel=1258.596923828125, norm_rel=0.023539992049336433, ref_abs_avg=22.980398178100586, test_abs_avg=22.978290557861328
production_forward2 grad[79] vs paper_forward: mean_abs=0.5014556646347046, max_abs=5.0, mean_rel=0.23839357495307922, max_rel=2250.0, norm_rel=0.022333920001983643, ref_abs_avg=22.502626419067383, test_abs_avg=22.504154205322266
production_forward2 grad[80] vs paper_forward: mean_abs=0.4132204055786133, max_abs=1.5, mean_rel=0.13621072471141815, max_rel=22.81290626525879, norm_rel=0.021749792620539665, ref_abs_avg=18.391765594482422, test_abs_avg=18.35148811340332
production_forward2 grad[81] vs paper_forward: mean_abs=0.5098507404327393, max_abs=5.125, mean_rel=0.14442500472068787, max_rel=741.1796264648438, norm_rel=0.0232958123087883, ref_abs_avg=21.93335723876953, test_abs_avg=21.932209014892578
production_forward2 grad[82] vs paper_forward: mean_abs=0.46933138370513916, max_abs=4.625, mean_rel=0.19571134448051453, max_rel=1531.2498779296875, norm_rel=0.02170238085091114, ref_abs_avg=21.655445098876953, test_abs_avg=21.651138305664062
production_forward2 grad[83] vs paper_forward: mean_abs=0.3721647262573242, max_abs=1.5, mean_rel=0.11163748800754547, max_rel=12.271679878234863, norm_rel=0.021077988669276237, ref_abs_avg=18.091938018798828, test_abs_avg=18.095794677734375
production_forward2 grad[84] vs paper_forward: mean_abs=0.47209128737449646, max_abs=4.0, mean_rel=0.14598335325717926, max_rel=711.8426513671875, norm_rel=0.022867344319820404, ref_abs_avg=20.74256706237793, test_abs_avg=20.743423461914062
production_forward2 grad[85] vs paper_forward: mean_abs=0.4472312331199646, max_abs=4.25, mean_rel=0.22854052484035492, max_rel=1312.4998779296875, norm_rel=0.021770060062408447, ref_abs_avg=20.595890045166016, test_abs_avg=20.599576950073242
production_forward2 grad[86] vs paper_forward: mean_abs=0.34944963455200195, max_abs=1.5625, mean_rel=0.13857115805149078, max_rel=20.52721405029297, norm_rel=0.021760523319244385, ref_abs_avg=16.10385513305664, test_abs_avg=16.120059967041016
production_forward2 grad[87] vs paper_forward: mean_abs=0.44546782970428467, max_abs=4.5, mean_rel=0.13183726370334625, max_rel=623.6146850585938, norm_rel=0.022224698215723038, ref_abs_avg=20.154205322265625, test_abs_avg=20.154510498046875
production_forward2 grad[88] vs paper_forward: mean_abs=0.4050893187522888, max_abs=4.0, mean_rel=0.17546914517879486, max_rel=1687.4998779296875, norm_rel=0.020766213536262512, ref_abs_avg=19.66566276550293, test_abs_avg=19.664653778076172
production_forward2 grad[89] vs paper_forward: mean_abs=0.3320337235927582, max_abs=1.5, mean_rel=0.31250688433647156, max_rel=127.10970306396484, norm_rel=0.020256411284208298, ref_abs_avg=16.784847259521484, test_abs_avg=16.768882751464844
production_forward2 grad[90] vs paper_forward: mean_abs=0.4185881018638611, max_abs=4.5, mean_rel=0.1285315752029419, max_rel=639.8353881835938, norm_rel=0.02169889770448208, ref_abs_avg=19.47226333618164, test_abs_avg=19.471574783325195
production_forward2 grad[91] vs paper_forward: mean_abs=0.3768962025642395, max_abs=3.5, mean_rel=0.1979888677597046, max_rel=1062.5, norm_rel=0.01998995617032051, ref_abs_avg=18.91483497619629, test_abs_avg=18.913963317871094
production_forward2 grad[92] vs paper_forward: mean_abs=0.31238794326782227, max_abs=1.25, mean_rel=0.0611598938703537, max_rel=4.468632698059082, norm_rel=0.01996074989438057, ref_abs_avg=15.877608299255371, test_abs_avg=15.870391845703125
production_forward2 grad[93] vs paper_forward: mean_abs=0.4019070565700531, max_abs=6.0, mean_rel=0.12926772236824036, max_rel=548.0787353515625, norm_rel=0.021402398124337196, ref_abs_avg=19.000743865966797, test_abs_avg=18.999818801879883
production_forward2 grad[94] vs paper_forward: mean_abs=0.3614615797996521, max_abs=4.0, mean_rel=0.16877591609954834, max_rel=1453.1248779296875, norm_rel=0.019693491980433464, ref_abs_avg=18.61383819580078, test_abs_avg=18.621437072753906
production_forward2 grad[95] vs paper_forward: mean_abs=0.27647167444229126, max_abs=1.0625, mean_rel=0.09872174263000488, max_rel=6.252868175506592, norm_rel=0.01821029558777809, ref_abs_avg=14.887700080871582, test_abs_avg=14.884265899658203
production_forward2 grad[96] vs paper_forward: mean_abs=0.3716685175895691, max_abs=4.75, mean_rel=0.11922945827245712, max_rel=574.2277221679688, norm_rel=0.020969359204173088, ref_abs_avg=18.022960662841797, test_abs_avg=18.021589279174805
production_forward2 grad[97] vs paper_forward: mean_abs=0.3431662321090698, max_abs=4.25, mean_rel=0.15938502550125122, max_rel=999.9999389648438, norm_rel=0.01963888294994831, ref_abs_avg=17.895389556884766, test_abs_avg=17.887432098388672
identity layers + randn queries
production_forward2 fwd+bwd:  113.584 ms
production_forward2 bwd-only: 96.000 ms
production_forward2 peak allocated: fwd=3.071 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=3.301 GiB, fwd+bwd=11.301 GiB
paper_forward fwd+bwd:  381.809 ms
paper_forward bwd-only: 301.438 ms
paper_forward peak allocated: fwd=29.706 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.721 GiB, fwd+bwd=32.471 GiB
production_forward fwd+bwd:  114.447 ms
production_forward bwd-only: 96.006 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.301 GiB, fwd+bwd=10.301 GiB
torch_compile_phases_forward fwd+bwd:  166.824 ms
torch_compile_phases_forward bwd-only: 132.537 ms
torch_compile_phases_forward peak allocated: fwd=12.781 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.078 GiB, fwd+bwd=17.330 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016872197156772017, max_abs=0.05078125
production_forward grad[0] vs paper_forward: mean_abs=0.008551599457859993, max_abs=0.5625, mean_rel=0.07273846864700317, max_rel=107.10929107666016, norm_rel=0.019876210018992424, ref_abs_avg=0.4674391746520996, test_abs_avg=0.46745866537094116
production_forward grad[1] vs paper_forward: mean_abs=7.474067687988281, max_abs=64.0, mean_rel=0.173776775598526, max_rel=504.2858581542969, norm_rel=0.020953016355633736, ref_abs_avg=322.45733642578125, test_abs_avg=322.57855224609375
production_forward grad[2] vs paper_forward: mean_abs=1.3420658111572266, max_abs=5.0, mean_rel=0.25467759370803833, max_rel=80.53822326660156, norm_rel=0.023841671645641327, ref_abs_avg=56.52017593383789, test_abs_avg=56.48738098144531
production_forward grad[3] vs paper_forward: mean_abs=1.6457786560058594, max_abs=11.0, mean_rel=0.1798223853111267, max_rel=4514.17919921875, norm_rel=0.024486646056175232, ref_abs_avg=67.65940856933594, test_abs_avg=67.67076110839844
production_forward grad[4] vs paper_forward: mean_abs=1.505303144454956, max_abs=9.0, mean_rel=0.4671463966369629, max_rel=4750.0, norm_rel=0.022871345281600952, ref_abs_avg=66.2699203491211, test_abs_avg=66.27244567871094
production_forward grad[5] vs paper_forward: mean_abs=1.1136054992675781, max_abs=4.5, mean_rel=0.07916216552257538, max_rel=5.741725921630859, norm_rel=0.02235371805727482, ref_abs_avg=51.672035217285156, test_abs_avg=51.63356399536133
production_forward grad[6] vs paper_forward: mean_abs=1.4368915557861328, max_abs=12.0, mean_rel=0.16988489031791687, max_rel=1496.5872802734375, norm_rel=0.024196049198508263, ref_abs_avg=59.84806823730469, test_abs_avg=59.85609436035156
production_forward grad[7] vs paper_forward: mean_abs=1.3311927318572998, max_abs=8.6875, mean_rel=0.44665777683258057, max_rel=4875.0, norm_rel=0.02277240715920925, ref_abs_avg=58.78927230834961, test_abs_avg=58.80188751220703
production_forward grad[8] vs paper_forward: mean_abs=0.9653756618499756, max_abs=4.0, mean_rel=0.09152165055274963, max_rel=6.889003753662109, norm_rel=0.023343486711382866, ref_abs_avg=42.16180419921875, test_abs_avg=42.20882797241211
production_forward grad[9] vs paper_forward: mean_abs=1.2969467639923096, max_abs=9.0, mean_rel=0.15823394060134888, max_rel=2696.779541015625, norm_rel=0.02398495562374592, ref_abs_avg=54.469337463378906, test_abs_avg=54.475929260253906
production_forward grad[10] vs paper_forward: mean_abs=1.1888290643692017, max_abs=7.5, mean_rel=0.3724581301212311, max_rel=3312.499755859375, norm_rel=0.022216137498617172, ref_abs_avg=53.84663391113281, test_abs_avg=53.85822677612305
production_forward grad[11] vs paper_forward: mean_abs=0.9061527252197266, max_abs=4.0, mean_rel=0.14103491604328156, max_rel=15.799010276794434, norm_rel=0.021554026752710342, ref_abs_avg=41.378841400146484, test_abs_avg=41.35517501831055
production_forward grad[12] vs paper_forward: mean_abs=1.202732801437378, max_abs=9.0, mean_rel=0.17177915573120117, max_rel=2159.06298828125, norm_rel=0.02380417101085186, ref_abs_avg=50.82490539550781, test_abs_avg=50.828277587890625
production_forward grad[13] vs paper_forward: mean_abs=1.1064610481262207, max_abs=7.5, mean_rel=0.29513779282569885, max_rel=2562.5, norm_rel=0.02200656570494175, ref_abs_avg=50.50739288330078, test_abs_avg=50.514862060546875
production_forward grad[14] vs paper_forward: mean_abs=0.8920211791992188, max_abs=3.125, mean_rel=0.0727154016494751, max_rel=6.376576900482178, norm_rel=0.021959960460662842, ref_abs_avg=41.30524444580078, test_abs_avg=41.29331970214844
production_forward grad[15] vs paper_forward: mean_abs=1.128957748413086, max_abs=8.0, mean_rel=0.16751982271671295, max_rel=3121.963134765625, norm_rel=0.023742860183119774, ref_abs_avg=47.83376693725586, test_abs_avg=47.83708190917969
production_forward grad[16] vs paper_forward: mean_abs=1.034358024597168, max_abs=6.25, mean_rel=0.3673507571220398, max_rel=4500.0, norm_rel=0.022004153579473495, ref_abs_avg=47.21449661254883, test_abs_avg=47.21709060668945
production_forward grad[17] vs paper_forward: mean_abs=0.8443450927734375, max_abs=3.0, mean_rel=0.07604789733886719, max_rel=4.132279396057129, norm_rel=0.022458327934145927, ref_abs_avg=37.06631851196289, test_abs_avg=37.020233154296875
production_forward grad[18] vs paper_forward: mean_abs=1.069697380065918, max_abs=7.0, mean_rel=0.15016043186187744, max_rel=1306.1986083984375, norm_rel=0.023577401414513588, ref_abs_avg=45.646175384521484, test_abs_avg=45.64996337890625
production_forward grad[19] vs paper_forward: mean_abs=0.9744071960449219, max_abs=6.625, mean_rel=0.3076031506061554, max_rel=2687.499755859375, norm_rel=0.02193848043680191, ref_abs_avg=44.634307861328125, test_abs_avg=44.63795852661133
production_forward grad[20] vs paper_forward: mean_abs=0.7489404678344727, max_abs=3.125, mean_rel=0.1016983836889267, max_rel=4.330184459686279, norm_rel=0.020960092544555664, ref_abs_avg=36.09153366088867, test_abs_avg=36.076873779296875
production_forward grad[21] vs paper_forward: mean_abs=1.0070570707321167, max_abs=7.0, mean_rel=0.15184970200061798, max_rel=2147.791259765625, norm_rel=0.023393943905830383, ref_abs_avg=43.35321807861328, test_abs_avg=43.357521057128906
production_forward grad[22] vs paper_forward: mean_abs=0.9267981052398682, max_abs=5.5, mean_rel=0.2834135890007019, max_rel=3312.499755859375, norm_rel=0.021840546280145645, ref_abs_avg=42.6602783203125, test_abs_avg=42.66387176513672
production_forward grad[23] vs paper_forward: mean_abs=0.7393093109130859, max_abs=3.15625, mean_rel=0.08367198705673218, max_rel=8.410897254943848, norm_rel=0.02224225178360939, ref_abs_avg=34.60549545288086, test_abs_avg=34.57575988769531
production_forward grad[24] vs paper_forward: mean_abs=0.9627800583839417, max_abs=7.0, mean_rel=0.14418098330497742, max_rel=866.320556640625, norm_rel=0.023451222106814384, ref_abs_avg=41.29863739013672, test_abs_avg=41.303077697753906
production_forward grad[25] vs paper_forward: mean_abs=0.88667893409729, max_abs=5.5, mean_rel=0.26691311597824097, max_rel=3624.999755859375, norm_rel=0.021697087213397026, ref_abs_avg=41.05302047729492, test_abs_avg=41.04827117919922
production_forward grad[26] vs paper_forward: mean_abs=0.8568909168243408, max_abs=3.25, mean_rel=0.17955173552036285, max_rel=40.4127082824707, norm_rel=0.024586042389273643, ref_abs_avg=35.17753219604492, test_abs_avg=35.15394973754883
production_forward grad[27] vs paper_forward: mean_abs=1.0935394763946533, max_abs=7.5, mean_rel=0.17454561591148376, max_rel=1484.7252197265625, norm_rel=0.025089876726269722, ref_abs_avg=43.8392448425293, test_abs_avg=43.84476852416992
production_forward grad[28] vs paper_forward: mean_abs=1.0180130004882812, max_abs=6.0, mean_rel=0.3981204330921173, max_rel=3328.124755859375, norm_rel=0.02358252927660942, ref_abs_avg=43.37293243408203, test_abs_avg=43.37947082519531
production_forward grad[29] vs paper_forward: mean_abs=0.8301988840103149, max_abs=3.25, mean_rel=0.4584965705871582, max_rel=142.06088256835938, norm_rel=0.024882972240447998, ref_abs_avg=32.582542419433594, test_abs_avg=32.608917236328125
production_forward grad[30] vs paper_forward: mean_abs=1.0203545093536377, max_abs=7.0, mean_rel=0.16266818344593048, max_rel=1339.7972412109375, norm_rel=0.025309044867753983, ref_abs_avg=40.524478912353516, test_abs_avg=40.53015899658203
production_forward grad[31] vs paper_forward: mean_abs=0.9560270309448242, max_abs=6.25, mean_rel=0.32149070501327515, max_rel=3124.999755859375, norm_rel=0.023993708193302155, ref_abs_avg=39.95964050292969, test_abs_avg=39.961509704589844
production_forward grad[32] vs paper_forward: mean_abs=0.7273745536804199, max_abs=3.5, mean_rel=0.10161270201206207, max_rel=19.764142990112305, norm_rel=0.024243805557489395, ref_abs_avg=30.84677505493164, test_abs_avg=30.844520568847656
production_forward grad[33] vs paper_forward: mean_abs=0.9519269466400146, max_abs=6.5, mean_rel=0.16984492540359497, max_rel=1159.0625, norm_rel=0.02542865090072155, ref_abs_avg=37.64191818237305, test_abs_avg=37.64471435546875
production_forward grad[34] vs paper_forward: mean_abs=0.8918027281761169, max_abs=5.5, mean_rel=0.279116153717041, max_rel=2531.25, norm_rel=0.02402743138372898, ref_abs_avg=37.26817321777344, test_abs_avg=37.27332305908203
production_forward grad[35] vs paper_forward: mean_abs=0.6918735504150391, max_abs=2.5, mean_rel=0.12169970571994781, max_rel=17.082374572753906, norm_rel=0.02347376197576523, ref_abs_avg=30.152812957763672, test_abs_avg=30.151004791259766
production_forward grad[36] vs paper_forward: mean_abs=0.9052906036376953, max_abs=6.5, mean_rel=0.15354010462760925, max_rel=963.443603515625, norm_rel=0.025111591443419456, ref_abs_avg=36.175132751464844, test_abs_avg=36.18016815185547
production_forward grad[37] vs paper_forward: mean_abs=0.8412792682647705, max_abs=5.25, mean_rel=0.3056311011314392, max_rel=2906.249755859375, norm_rel=0.023862840607762337, ref_abs_avg=35.35529327392578, test_abs_avg=35.35234451293945
production_forward grad[38] vs paper_forward: mean_abs=0.6737604141235352, max_abs=3.5, mean_rel=0.10972357541322708, max_rel=19.020904541015625, norm_rel=0.025171751156449318, ref_abs_avg=27.759523391723633, test_abs_avg=27.751575469970703
production_forward grad[39] vs paper_forward: mean_abs=0.8520711660385132, max_abs=5.5, mean_rel=0.1726064383983612, max_rel=1181.1961669921875, norm_rel=0.024906625971198082, ref_abs_avg=34.28187942504883, test_abs_avg=34.284645080566406
production_forward grad[40] vs paper_forward: mean_abs=0.7905338406562805, max_abs=4.75, mean_rel=0.32561126351356506, max_rel=2312.5, norm_rel=0.023410949856042862, ref_abs_avg=33.829437255859375, test_abs_avg=33.83741760253906
production_forward grad[41] vs paper_forward: mean_abs=0.6450376510620117, max_abs=2.59375, mean_rel=0.07118498533964157, max_rel=5.697805404663086, norm_rel=0.022955888882279396, ref_abs_avg=28.64440155029297, test_abs_avg=28.674720764160156
production_forward grad[42] vs paper_forward: mean_abs=0.8059067726135254, max_abs=5.0, mean_rel=0.16593816876411438, max_rel=1219.42626953125, norm_rel=0.024747470393776894, ref_abs_avg=32.640724182128906, test_abs_avg=32.64277648925781
production_forward grad[43] vs paper_forward: mean_abs=0.7510231137275696, max_abs=5.125, mean_rel=0.26123058795928955, max_rel=2187.5, norm_rel=0.02326369658112526, ref_abs_avg=32.32013702392578, test_abs_avg=32.320960998535156
production_forward grad[44] vs paper_forward: mean_abs=0.5742101669311523, max_abs=2.3125, mean_rel=0.26748213171958923, max_rel=49.920127868652344, norm_rel=0.022840505465865135, ref_abs_avg=25.098264694213867, test_abs_avg=25.096752166748047
production_forward grad[45] vs paper_forward: mean_abs=0.7697930335998535, max_abs=7.0, mean_rel=0.16364622116088867, max_rel=1067.6025390625, norm_rel=0.02446787618100643, ref_abs_avg=31.541820526123047, test_abs_avg=31.545408248901367
production_forward grad[46] vs paper_forward: mean_abs=0.7159220576286316, max_abs=5.0, mean_rel=0.27761608362197876, max_rel=2406.25, norm_rel=0.022858932614326477, ref_abs_avg=31.29827117919922, test_abs_avg=31.300458908081055
production_forward grad[47] vs paper_forward: mean_abs=0.5608310699462891, max_abs=2.21875, mean_rel=0.11113782227039337, max_rel=8.630021095275879, norm_rel=0.02235354669392109, ref_abs_avg=24.916790008544922, test_abs_avg=24.90578842163086
production_forward grad[48] vs paper_forward: mean_abs=0.7385985851287842, max_abs=6.0, mean_rel=0.16628201305866241, max_rel=937.790283203125, norm_rel=0.024302834644913673, ref_abs_avg=30.514930725097656, test_abs_avg=30.517047882080078
production_forward grad[49] vs paper_forward: mean_abs=0.6830429434776306, max_abs=5.0, mean_rel=0.22722077369689941, max_rel=1656.2498779296875, norm_rel=0.022349320352077484, ref_abs_avg=30.528940200805664, test_abs_avg=30.532907485961914
production_forward grad[50] vs paper_forward: mean_abs=0.6736202239990234, max_abs=2.75, mean_rel=0.08827956020832062, max_rel=4.394557476043701, norm_rel=0.024744898080825806, ref_abs_avg=26.441181182861328, test_abs_avg=26.535572052001953
production_forward grad[51] vs paper_forward: mean_abs=0.8388605117797852, max_abs=6.25, mean_rel=0.16807690262794495, max_rel=911.7340698242188, norm_rel=0.025675605982542038, ref_abs_avg=32.79296112060547, test_abs_avg=32.79286575317383
production_forward grad[52] vs paper_forward: mean_abs=0.7784501314163208, max_abs=5.375, mean_rel=0.2911250591278076, max_rel=2125.0, norm_rel=0.024600215256214142, ref_abs_avg=31.69656753540039, test_abs_avg=31.688190460205078
production_forward grad[53] vs paper_forward: mean_abs=0.5857810974121094, max_abs=2.0, mean_rel=0.11018098890781403, max_rel=7.639241695404053, norm_rel=0.02451466955244541, ref_abs_avg=23.763731002807617, test_abs_avg=23.748353958129883
production_forward grad[54] vs paper_forward: mean_abs=0.7628574371337891, max_abs=6.0, mean_rel=0.169779971241951, max_rel=1189.89404296875, norm_rel=0.025350559502840042, ref_abs_avg=30.182022094726562, test_abs_avg=30.183229446411133
production_forward grad[55] vs paper_forward: mean_abs=0.7055602073669434, max_abs=4.25, mean_rel=0.27687400579452515, max_rel=2203.125, norm_rel=0.023760676383972168, ref_abs_avg=29.71929168701172, test_abs_avg=29.723670959472656
production_forward grad[56] vs paper_forward: mean_abs=0.5450713634490967, max_abs=2.25, mean_rel=0.09172242879867554, max_rel=7.530482292175293, norm_rel=0.022699572145938873, ref_abs_avg=24.213947296142578, test_abs_avg=24.183673858642578
production_forward grad[57] vs paper_forward: mean_abs=0.7083698511123657, max_abs=6.0, mean_rel=0.1578013002872467, max_rel=1412.008056640625, norm_rel=0.024773171171545982, ref_abs_avg=28.638219833374023, test_abs_avg=28.64082908630371
production_forward grad[58] vs paper_forward: mean_abs=0.6511149406433105, max_abs=5.3125, mean_rel=0.2808467745780945, max_rel=2874.999755859375, norm_rel=0.023131215944886208, ref_abs_avg=28.098445892333984, test_abs_avg=28.092506408691406
production_forward grad[59] vs paper_forward: mean_abs=0.5373592376708984, max_abs=2.25, mean_rel=0.12976893782615662, max_rel=25.936038970947266, norm_rel=0.024710139259696007, ref_abs_avg=21.586933135986328, test_abs_avg=21.58340072631836
production_forward grad[60] vs paper_forward: mean_abs=0.6591979265213013, max_abs=5.25, mean_rel=0.15788626670837402, max_rel=1069.58349609375, norm_rel=0.024516670033335686, ref_abs_avg=26.92159652709961, test_abs_avg=26.921266555786133
production_forward grad[61] vs paper_forward: mean_abs=0.610370934009552, max_abs=4.5, mean_rel=0.24591809511184692, max_rel=1562.4998779296875, norm_rel=0.02304510399699211, ref_abs_avg=26.438827514648438, test_abs_avg=26.433921813964844
production_forward grad[62] vs paper_forward: mean_abs=0.4959786534309387, max_abs=2.0, mean_rel=0.25389644503593445, max_rel=70.07051086425781, norm_rel=0.022942831739783287, ref_abs_avg=20.833148956298828, test_abs_avg=20.868335723876953
production_forward grad[63] vs paper_forward: mean_abs=0.6206351518630981, max_abs=5.5, mean_rel=0.1534235179424286, max_rel=1372.43505859375, norm_rel=0.023971037939190865, ref_abs_avg=25.910940170288086, test_abs_avg=25.910259246826172
production_forward grad[64] vs paper_forward: mean_abs=0.5737791061401367, max_abs=4.375, mean_rel=0.22974181175231934, max_rel=1749.9998779296875, norm_rel=0.02262549102306366, ref_abs_avg=25.379470825195312, test_abs_avg=25.38043975830078
production_forward grad[65] vs paper_forward: mean_abs=0.45674800872802734, max_abs=1.75, mean_rel=0.11653141677379608, max_rel=17.018415451049805, norm_rel=0.02363068424165249, ref_abs_avg=19.992420196533203, test_abs_avg=19.966283798217773
production_forward grad[66] vs paper_forward: mean_abs=0.5809087753295898, max_abs=5.0, mean_rel=0.1493690311908722, max_rel=781.0452270507812, norm_rel=0.023827064782381058, ref_abs_avg=24.39061737060547, test_abs_avg=24.390623092651367
production_forward grad[67] vs paper_forward: mean_abs=0.5449066162109375, max_abs=4.75, mean_rel=0.2260369211435318, max_rel=1593.7498779296875, norm_rel=0.022372549399733543, ref_abs_avg=24.28000259399414, test_abs_avg=24.276248931884766
production_forward grad[68] vs paper_forward: mean_abs=0.4406163692474365, max_abs=1.65625, mean_rel=0.2409728765487671, max_rel=26.738494873046875, norm_rel=0.022724375128746033, ref_abs_avg=19.585926055908203, test_abs_avg=19.589019775390625
production_forward grad[69] vs paper_forward: mean_abs=0.5572205781936646, max_abs=4.625, mean_rel=0.14412575960159302, max_rel=612.479736328125, norm_rel=0.02331225387752056, ref_abs_avg=23.9407901763916, test_abs_avg=23.9403076171875
production_forward grad[70] vs paper_forward: mean_abs=0.5169080495834351, max_abs=3.5625, mean_rel=0.2532881498336792, max_rel=1843.7498779296875, norm_rel=0.02203490026295185, ref_abs_avg=23.451183319091797, test_abs_avg=23.461347579956055
production_forward grad[71] vs paper_forward: mean_abs=0.39270472526550293, max_abs=1.5, mean_rel=0.11433757841587067, max_rel=13.068227767944336, norm_rel=0.020785655826330185, ref_abs_avg=19.17987823486328, test_abs_avg=19.165800094604492
production_forward grad[72] vs paper_forward: mean_abs=0.5313392877578735, max_abs=4.0, mean_rel=0.15325477719306946, max_rel=1282.7384033203125, norm_rel=0.022871108725667, ref_abs_avg=23.254371643066406, test_abs_avg=23.25584602355957
production_forward grad[73] vs paper_forward: mean_abs=0.49236515164375305, max_abs=4.0, mean_rel=0.21280132234096527, max_rel=1718.7498779296875, norm_rel=0.021631207317113876, ref_abs_avg=22.80669403076172, test_abs_avg=22.811992645263672
production_forward grad[74] vs paper_forward: mean_abs=0.45283234119415283, max_abs=2.25, mean_rel=0.07734499126672745, max_rel=3.119025945663452, norm_rel=0.022059371694922447, ref_abs_avg=20.350566864013672, test_abs_avg=20.394933700561523
production_forward grad[75] vs paper_forward: mean_abs=0.5854726433753967, max_abs=4.78125, mean_rel=0.15383246541023254, max_rel=806.818115234375, norm_rel=0.024721762165427208, ref_abs_avg=23.71868133544922, test_abs_avg=23.720386505126953
production_forward grad[76] vs paper_forward: mean_abs=0.535799503326416, max_abs=4.5, mean_rel=0.2808782458305359, max_rel=1687.4998779296875, norm_rel=0.02308559976518154, ref_abs_avg=23.161487579345703, test_abs_avg=23.16394805908203
production_forward grad[77] vs paper_forward: mean_abs=0.4333444833755493, max_abs=1.76171875, mean_rel=0.15952828526496887, max_rel=24.61172103881836, norm_rel=0.022734511643648148, ref_abs_avg=18.934097290039062, test_abs_avg=18.932220458984375
production_forward grad[78] vs paper_forward: mean_abs=0.545069694519043, max_abs=4.5, mean_rel=0.15905526280403137, max_rel=1601.5802001953125, norm_rel=0.023945234715938568, ref_abs_avg=22.76857566833496, test_abs_avg=22.77061653137207
production_forward grad[79] vs paper_forward: mean_abs=0.500862181186676, max_abs=3.8125, mean_rel=0.2382848858833313, max_rel=1906.2498779296875, norm_rel=0.022901788353919983, ref_abs_avg=21.842647552490234, test_abs_avg=21.839797973632812
production_forward grad[80] vs paper_forward: mean_abs=0.367276668548584, max_abs=1.25390625, mean_rel=0.0758374035358429, max_rel=2.2079050540924072, norm_rel=0.020170288160443306, ref_abs_avg=18.13812828063965, test_abs_avg=18.12120819091797
production_forward grad[81] vs paper_forward: mean_abs=0.500059962272644, max_abs=4.5, mean_rel=0.14143186807632446, max_rel=1066.1151123046875, norm_rel=0.023456530645489693, ref_abs_avg=21.38694190979004, test_abs_avg=21.38799476623535
production_forward grad[82] vs paper_forward: mean_abs=0.466718852519989, max_abs=4.0, mean_rel=0.20084601640701294, max_rel=1624.9998779296875, norm_rel=0.02193378657102585, ref_abs_avg=21.231689453125, test_abs_avg=21.229454040527344
production_forward grad[83] vs paper_forward: mean_abs=0.3703620433807373, max_abs=1.375, mean_rel=0.13576382398605347, max_rel=16.58399772644043, norm_rel=0.022623756900429726, ref_abs_avg=16.200519561767578, test_abs_avg=16.23784637451172
production_forward grad[84] vs paper_forward: mean_abs=0.47242268919944763, max_abs=4.5, mean_rel=0.13808229565620422, max_rel=540.783203125, norm_rel=0.023113712668418884, ref_abs_avg=20.529434204101562, test_abs_avg=20.531858444213867
production_forward grad[85] vs paper_forward: mean_abs=0.43151700496673584, max_abs=4.5, mean_rel=0.21227584779262543, max_rel=1359.3748779296875, norm_rel=0.021457206457853317, ref_abs_avg=20.168514251708984, test_abs_avg=20.174163818359375
production_forward grad[86] vs paper_forward: mean_abs=0.3715047836303711, max_abs=1.5, mean_rel=0.09674204140901566, max_rel=6.6569437980651855, norm_rel=0.02257411740720272, ref_abs_avg=16.63687515258789, test_abs_avg=16.64946937561035
production_forward grad[87] vs paper_forward: mean_abs=0.4455168843269348, max_abs=4.5, mean_rel=0.134426087141037, max_rel=590.7275390625, norm_rel=0.02266121841967106, ref_abs_avg=19.7703800201416, test_abs_avg=19.77223014831543
production_forward grad[88] vs paper_forward: mean_abs=0.4070540964603424, max_abs=3.875, mean_rel=0.16276055574417114, max_rel=1187.5, norm_rel=0.02017282135784626, ref_abs_avg=20.136638641357422, test_abs_avg=20.14139175415039
production_forward grad[89] vs paper_forward: mean_abs=0.3538849353790283, max_abs=1.5, mean_rel=0.12747260928153992, max_rel=27.067031860351562, norm_rel=0.021419961005449295, ref_abs_avg=16.61333465576172, test_abs_avg=16.585054397583008
production_forward grad[90] vs paper_forward: mean_abs=0.4242118000984192, max_abs=6.0, mean_rel=0.13460347056388855, max_rel=659.8638305664062, norm_rel=0.02217642404139042, ref_abs_avg=19.270347595214844, test_abs_avg=19.271671295166016
production_forward grad[91] vs paper_forward: mean_abs=0.38681063055992126, max_abs=3.5, mean_rel=0.1768437623977661, max_rel=1296.8748779296875, norm_rel=0.02031945437192917, ref_abs_avg=19.06947898864746, test_abs_avg=19.075239181518555
production_forward grad[92] vs paper_forward: mean_abs=0.30743932723999023, max_abs=1.375, mean_rel=0.1654708832502365, max_rel=35.979679107666016, norm_rel=0.019397098571062088, ref_abs_avg=16.33472442626953, test_abs_avg=16.340938568115234
production_forward grad[93] vs paper_forward: mean_abs=0.3955245614051819, max_abs=5.0, mean_rel=0.1280735433101654, max_rel=509.2669372558594, norm_rel=0.02162330411374569, ref_abs_avg=18.52318572998047, test_abs_avg=18.5244140625
production_forward grad[94] vs paper_forward: mean_abs=0.36666691303253174, max_abs=4.0, mean_rel=0.16394636034965515, max_rel=1437.4998779296875, norm_rel=0.020093901082873344, ref_abs_avg=18.496227264404297, test_abs_avg=18.49152183532715
production_forward grad[95] vs paper_forward: mean_abs=0.31070947647094727, max_abs=1.25, mean_rel=0.10545015335083008, max_rel=7.560337543487549, norm_rel=0.019940821453928947, ref_abs_avg=15.836246490478516, test_abs_avg=15.833780288696289
production_forward grad[96] vs paper_forward: mean_abs=0.38468119502067566, max_abs=5.0, mean_rel=0.1336909830570221, max_rel=653.0397338867188, norm_rel=0.02132991887629032, ref_abs_avg=18.322147369384766, test_abs_avg=18.32379150390625
production_forward grad[97] vs paper_forward: mean_abs=0.33920252323150635, max_abs=4.453125, mean_rel=0.1496196687221527, max_rel=1390.6248779296875, norm_rel=0.018795862793922424, ref_abs_avg=18.27035903930664, test_abs_avg=18.27344512939453
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016893367283046246, max_abs=0.05078125
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008901778608560562, max_abs=0.625, mean_rel=0.07534845918416977, max_rel=127.6214828491211, norm_rel=0.020570622757077217, ref_abs_avg=0.4674391746520996, test_abs_avg=0.46744847297668457
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.590754985809326, max_abs=62.0, mean_rel=0.17663006484508514, max_rel=472.5754699707031, norm_rel=0.02124100551009178, ref_abs_avg=322.45733642578125, test_abs_avg=322.6021728515625
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3013725280761719, max_abs=5.0, mean_rel=0.19686375558376312, max_rel=44.0717658996582, norm_rel=0.023383475840091705, ref_abs_avg=56.52017593383789, test_abs_avg=56.50407409667969
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.694338083267212, max_abs=11.0, mean_rel=0.18763890862464905, max_rel=1837.260986328125, norm_rel=0.02521771751344204, ref_abs_avg=67.65940856933594, test_abs_avg=67.6670150756836
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5642578601837158, max_abs=10.5625, mean_rel=0.4498264193534851, max_rel=5062.5, norm_rel=0.023746652528643608, ref_abs_avg=66.2699203491211, test_abs_avg=66.27947998046875
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.1531352996826172, max_abs=4.25, mean_rel=0.08069916069507599, max_rel=6.5898051261901855, norm_rel=0.022811148315668106, ref_abs_avg=51.672035217285156, test_abs_avg=51.5968017578125
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4812312126159668, max_abs=9.0, mean_rel=0.17358195781707764, max_rel=1782.756103515625, norm_rel=0.02492964267730713, ref_abs_avg=59.84806823730469, test_abs_avg=59.855430603027344
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3783818483352661, max_abs=9.4375, mean_rel=0.41909390687942505, max_rel=4531.25, norm_rel=0.023560497909784317, ref_abs_avg=58.78927230834961, test_abs_avg=58.8019905090332
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=1.0296306610107422, max_abs=4.0, mean_rel=0.08320608735084534, max_rel=3.7664177417755127, norm_rel=0.02489416114985943, ref_abs_avg=42.16180419921875, test_abs_avg=42.195167541503906
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3348007202148438, max_abs=8.5, mean_rel=0.1683366894721985, max_rel=3136.196044921875, norm_rel=0.024668699130415916, ref_abs_avg=54.469337463378906, test_abs_avg=54.476173400878906
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.227625846862793, max_abs=8.25, mean_rel=0.3757913112640381, max_rel=4406.25, norm_rel=0.022944822907447815, ref_abs_avg=53.84663391113281, test_abs_avg=53.853904724121094
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9465265274047852, max_abs=3.75, mean_rel=0.11093363165855408, max_rel=8.462965965270996, norm_rel=0.022900238633155823, ref_abs_avg=41.378841400146484, test_abs_avg=41.366336822509766
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2356925010681152, max_abs=9.0, mean_rel=0.1781512349843979, max_rel=1897.3154296875, norm_rel=0.024441825225949287, ref_abs_avg=50.82490539550781, test_abs_avg=50.82850646972656
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1395998001098633, max_abs=8.0, mean_rel=0.3137838840484619, max_rel=3312.499755859375, norm_rel=0.022638214752078056, ref_abs_avg=50.50739288330078, test_abs_avg=50.511566162109375
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8789381980895996, max_abs=3.25, mean_rel=0.06493647396564484, max_rel=3.5225179195404053, norm_rel=0.02217469923198223, ref_abs_avg=41.30524444580078, test_abs_avg=41.29367446899414
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1557725667953491, max_abs=8.0, mean_rel=0.17220550775527954, max_rel=3578.216064453125, norm_rel=0.024311058223247528, ref_abs_avg=47.83376693725586, test_abs_avg=47.837066650390625
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0705444812774658, max_abs=6.25, mean_rel=0.38995200395584106, max_rel=5125.0, norm_rel=0.022759445011615753, ref_abs_avg=47.21449661254883, test_abs_avg=47.21514892578125
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8817739486694336, max_abs=3.5, mean_rel=0.083431176841259, max_rel=5.492415904998779, norm_rel=0.023584064096212387, ref_abs_avg=37.06631851196289, test_abs_avg=37.01569747924805
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0944446325302124, max_abs=7.0, mean_rel=0.1551382839679718, max_rel=1334.7149658203125, norm_rel=0.024109726771712303, ref_abs_avg=45.646175384521484, test_abs_avg=45.64764404296875
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=1.002607822418213, max_abs=7.0, mean_rel=0.29272183775901794, max_rel=2343.75, norm_rel=0.02255636639893055, ref_abs_avg=44.634307861328125, test_abs_avg=44.636783599853516
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7502461671829224, max_abs=3.2265625, mean_rel=0.13244014978408813, max_rel=18.653100967407227, norm_rel=0.02082142047584057, ref_abs_avg=36.09153366088867, test_abs_avg=36.02861785888672
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0308445692062378, max_abs=7.0, mean_rel=0.1582467257976532, max_rel=1610.901611328125, norm_rel=0.023937078192830086, ref_abs_avg=43.35321807861328, test_abs_avg=43.35633850097656
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9513502717018127, max_abs=6.0, mean_rel=0.29606378078460693, max_rel=2999.999755859375, norm_rel=0.02242456004023552, ref_abs_avg=42.6602783203125, test_abs_avg=42.661293029785156
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7240307331085205, max_abs=3.03125, mean_rel=0.07428213953971863, max_rel=10.689353942871094, norm_rel=0.0215668473392725, ref_abs_avg=34.60549545288086, test_abs_avg=34.60945510864258
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9825844764709473, max_abs=7.0, mean_rel=0.15054062008857727, max_rel=1767.5728759765625, norm_rel=0.023915255442261696, ref_abs_avg=41.29863739013672, test_abs_avg=41.30342483520508
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.9095425605773926, max_abs=6.0, mean_rel=0.27976229786872864, max_rel=3562.499755859375, norm_rel=0.022250860929489136, ref_abs_avg=41.05302047729492, test_abs_avg=41.04942321777344
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8805744647979736, max_abs=3.5, mean_rel=0.20271998643875122, max_rel=47.434871673583984, norm_rel=0.025045664981007576, ref_abs_avg=35.17753219604492, test_abs_avg=35.16912078857422
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1197550296783447, max_abs=9.5, mean_rel=0.17513832449913025, max_rel=1789.178955078125, norm_rel=0.02567474916577339, ref_abs_avg=43.8392448425293, test_abs_avg=43.8438835144043
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0428597927093506, max_abs=6.25, mean_rel=0.40512508153915405, max_rel=4281.25, norm_rel=0.02415372245013714, ref_abs_avg=43.37293243408203, test_abs_avg=43.38130569458008
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.837982177734375, max_abs=3.5, mean_rel=0.5872176885604858, max_rel=201.73306274414062, norm_rel=0.0247068889439106, ref_abs_avg=32.582542419433594, test_abs_avg=32.617706298828125
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.0430208444595337, max_abs=8.0, mean_rel=0.16703654825687408, max_rel=1528.915283203125, norm_rel=0.02585003152489662, ref_abs_avg=40.524478912353516, test_abs_avg=40.5286865234375
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9783450961112976, max_abs=6.875, mean_rel=0.319393515586853, max_rel=3437.499755859375, norm_rel=0.024551022797822952, ref_abs_avg=39.95964050292969, test_abs_avg=39.961097717285156
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7590823173522949, max_abs=3.25, mean_rel=0.09235339611768723, max_rel=16.957040786743164, norm_rel=0.025053586810827255, ref_abs_avg=30.84677505493164, test_abs_avg=30.865447998046875
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9700313806533813, max_abs=6.5, mean_rel=0.170948326587677, max_rel=1128.140380859375, norm_rel=0.025892063975334167, ref_abs_avg=37.64191818237305, test_abs_avg=37.64349365234375
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.9107014536857605, max_abs=5.5, mean_rel=0.30827754735946655, max_rel=2624.999755859375, norm_rel=0.02449619211256504, ref_abs_avg=37.26817321777344, test_abs_avg=37.270790100097656
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.7292671203613281, max_abs=3.0625, mean_rel=0.14567384123802185, max_rel=14.983004570007324, norm_rel=0.024307291954755783, ref_abs_avg=30.152812957763672, test_abs_avg=30.161231994628906
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.9210118055343628, max_abs=6.25, mean_rel=0.16019240021705627, max_rel=946.3362426757812, norm_rel=0.0255520548671484, ref_abs_avg=36.175132751464844, test_abs_avg=36.17987823486328
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.860042929649353, max_abs=5.25, mean_rel=0.3152196407318115, max_rel=3234.374755859375, norm_rel=0.02438655123114586, ref_abs_avg=35.35529327392578, test_abs_avg=35.35559844970703
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.707403838634491, max_abs=3.0, mean_rel=0.08157312870025635, max_rel=2.7964205741882324, norm_rel=0.026049355044960976, ref_abs_avg=27.759523391723633, test_abs_avg=27.731250762939453
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8659214973449707, max_abs=6.0, mean_rel=0.18050616979599, max_rel=2186.71826171875, norm_rel=0.02532714605331421, ref_abs_avg=34.28187942504883, test_abs_avg=34.28413391113281
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.8039615154266357, max_abs=4.5625, mean_rel=0.34359854459762573, max_rel=2531.25, norm_rel=0.023797770962119102, ref_abs_avg=33.829437255859375, test_abs_avg=33.839256286621094
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.6688566207885742, max_abs=2.75, mean_rel=0.08558769524097443, max_rel=11.003698348999023, norm_rel=0.023608844727277756, ref_abs_avg=28.64440155029297, test_abs_avg=28.66855239868164
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.8181162476539612, max_abs=6.0, mean_rel=0.17211641371250153, max_rel=1783.6224365234375, norm_rel=0.025120392441749573, ref_abs_avg=32.640724182128906, test_abs_avg=32.64229202270508
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.765329897403717, max_abs=5.0, mean_rel=0.2779659330844879, max_rel=3015.624755859375, norm_rel=0.023710470646619797, ref_abs_avg=32.32013702392578, test_abs_avg=32.320777893066406
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5777206420898438, max_abs=2.25, mean_rel=0.3748749792575836, max_rel=114.0841293334961, norm_rel=0.023501010611653328, ref_abs_avg=25.098264694213867, test_abs_avg=25.09404945373535
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7809900045394897, max_abs=6.0, mean_rel=0.16416814923286438, max_rel=1126.205810546875, norm_rel=0.024822412058711052, ref_abs_avg=31.541820526123047, test_abs_avg=31.544361114501953
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.727313756942749, max_abs=5.0, mean_rel=0.28779512643814087, max_rel=2093.75, norm_rel=0.023218601942062378, ref_abs_avg=31.29827117919922, test_abs_avg=31.296337127685547
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5749025344848633, max_abs=2.40625, mean_rel=0.11006860435009003, max_rel=9.640120506286621, norm_rel=0.022978438064455986, ref_abs_avg=24.916790008544922, test_abs_avg=24.8898868560791
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7478058338165283, max_abs=5.5, mean_rel=0.16602525115013123, max_rel=1175.6348876953125, norm_rel=0.0245989840477705, ref_abs_avg=30.514930725097656, test_abs_avg=30.51655387878418
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6926367878913879, max_abs=5.0, mean_rel=0.2334749400615692, max_rel=1781.2498779296875, norm_rel=0.022643985226750374, ref_abs_avg=30.528940200805664, test_abs_avg=30.5328426361084
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6498591899871826, max_abs=2.75, mean_rel=0.09186890721321106, max_rel=3.6028661727905273, norm_rel=0.02391352690756321, ref_abs_avg=26.441181182861328, test_abs_avg=26.485958099365234
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8518386483192444, max_abs=7.0, mean_rel=0.1696237325668335, max_rel=898.0535888671875, norm_rel=0.026056939736008644, ref_abs_avg=32.79296112060547, test_abs_avg=32.792945861816406
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7923256158828735, max_abs=5.25, mean_rel=0.32397425174713135, max_rel=2140.625, norm_rel=0.02502712607383728, ref_abs_avg=31.69656753540039, test_abs_avg=31.689090728759766
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.6245384216308594, max_abs=2.5, mean_rel=0.11501066386699677, max_rel=10.220924377441406, norm_rel=0.02614085003733635, ref_abs_avg=23.763731002807617, test_abs_avg=23.77829933166504
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7736386060714722, max_abs=6.0, mean_rel=0.1666565239429474, max_rel=814.9861450195312, norm_rel=0.025693954899907112, ref_abs_avg=30.182022094726562, test_abs_avg=30.182680130004883
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.716867208480835, max_abs=4.75, mean_rel=0.2883201837539673, max_rel=2687.499755859375, norm_rel=0.024134142324328423, ref_abs_avg=29.71929168701172, test_abs_avg=29.720752716064453
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5621044635772705, max_abs=2.1875, mean_rel=0.13300684094429016, max_rel=24.40837287902832, norm_rel=0.023144053295254707, ref_abs_avg=24.213947296142578, test_abs_avg=24.179513931274414
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7185471057891846, max_abs=6.0, mean_rel=0.1554965078830719, max_rel=1047.32177734375, norm_rel=0.025115683674812317, ref_abs_avg=28.638219833374023, test_abs_avg=28.64049530029297
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6619608402252197, max_abs=4.875, mean_rel=0.2999843657016754, max_rel=2437.5, norm_rel=0.02351374924182892, ref_abs_avg=28.098445892333984, test_abs_avg=28.094165802001953
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5309104919433594, max_abs=2.125, mean_rel=0.11159443110227585, max_rel=12.090483665466309, norm_rel=0.02482389472424984, ref_abs_avg=21.586933135986328, test_abs_avg=21.579017639160156
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6667205095291138, max_abs=5.0, mean_rel=0.15998995304107666, max_rel=1420.9930419921875, norm_rel=0.024792691692709923, ref_abs_avg=26.92159652709961, test_abs_avg=26.922019958496094
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6201335787773132, max_abs=4.25, mean_rel=0.2432934045791626, max_rel=1414.0623779296875, norm_rel=0.02340870536863804, ref_abs_avg=26.438827514648438, test_abs_avg=26.437021255493164
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.49103766679763794, max_abs=1.796875, mean_rel=0.3499709665775299, max_rel=114.38050842285156, norm_rel=0.022904319688677788, ref_abs_avg=20.833148956298828, test_abs_avg=20.885967254638672
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6276875734329224, max_abs=4.5625, mean_rel=0.15687668323516846, max_rel=1346.5433349609375, norm_rel=0.024210134521126747, ref_abs_avg=25.910940170288086, test_abs_avg=25.91031265258789
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5789800882339478, max_abs=4.6875, mean_rel=0.22477421164512634, max_rel=1937.4998779296875, norm_rel=0.022849800065159798, ref_abs_avg=25.379470825195312, test_abs_avg=25.376529693603516
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.4499077796936035, max_abs=2.3125, mean_rel=0.11081845313310623, max_rel=19.009145736694336, norm_rel=0.02340022847056389, ref_abs_avg=19.992420196533203, test_abs_avg=19.9644832611084
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5863770246505737, max_abs=6.0, mean_rel=0.15165358781814575, max_rel=828.260498046875, norm_rel=0.02403847873210907, ref_abs_avg=24.39061737060547, test_abs_avg=24.391891479492188
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5508150458335876, max_abs=4.5, mean_rel=0.22821637988090515, max_rel=1374.9998779296875, norm_rel=0.02262476086616516, ref_abs_avg=24.28000259399414, test_abs_avg=24.27597999572754
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.4403069019317627, max_abs=1.578125, mean_rel=0.23250089585781097, max_rel=32.13964080810547, norm_rel=0.022380128502845764, ref_abs_avg=19.585926055908203, test_abs_avg=19.594127655029297
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5619146823883057, max_abs=4.125, mean_rel=0.14689522981643677, max_rel=601.8192749023438, norm_rel=0.0234831590205431, ref_abs_avg=23.9407901763916, test_abs_avg=23.940242767333984
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.520897626876831, max_abs=3.75, mean_rel=0.25311368703842163, max_rel=1562.4998779296875, norm_rel=0.022202948108315468, ref_abs_avg=23.451183319091797, test_abs_avg=23.460988998413086
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.4227898120880127, max_abs=1.625, mean_rel=0.08110614120960236, max_rel=7.396147727966309, norm_rel=0.02202947810292244, ref_abs_avg=19.17987823486328, test_abs_avg=19.173301696777344
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5345885157585144, max_abs=4.0, mean_rel=0.1508481204509735, max_rel=956.8821411132812, norm_rel=0.023003248497843742, ref_abs_avg=23.254371643066406, test_abs_avg=23.25641632080078
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.49868786334991455, max_abs=4.0, mean_rel=0.22854456305503845, max_rel=2265.625, norm_rel=0.021931009367108345, ref_abs_avg=22.80669403076172, test_abs_avg=22.81443214416504
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4374208450317383, max_abs=2.125, mean_rel=0.08489343523979187, max_rel=5.643945693969727, norm_rel=0.021793628111481667, ref_abs_avg=20.350566864013672, test_abs_avg=20.402799606323242
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5912485122680664, max_abs=4.5, mean_rel=0.15219083428382874, max_rel=688.2410278320312, norm_rel=0.02496335655450821, ref_abs_avg=23.71868133544922, test_abs_avg=23.719829559326172
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5443634986877441, max_abs=4.25, mean_rel=0.2687673568725586, max_rel=2687.499755859375, norm_rel=0.02349005453288555, ref_abs_avg=23.161487579345703, test_abs_avg=23.162750244140625
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.420032262802124, max_abs=1.68359375, mean_rel=0.13975629210472107, max_rel=12.51185131072998, norm_rel=0.022440331056714058, ref_abs_avg=18.934097290039062, test_abs_avg=18.950439453125
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5504371523857117, max_abs=5.0, mean_rel=0.1611759513616562, max_rel=1141.91064453125, norm_rel=0.02417851984500885, ref_abs_avg=22.76857566833496, test_abs_avg=22.770666122436523
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.5064879655838013, max_abs=3.8125, mean_rel=0.24704331159591675, max_rel=1765.6248779296875, norm_rel=0.023157624527812004, ref_abs_avg=21.842647552490234, test_abs_avg=21.843265533447266
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.36826276779174805, max_abs=1.53515625, mean_rel=0.06999554485082626, max_rel=2.268319845199585, norm_rel=0.02037503570318222, ref_abs_avg=18.13812828063965, test_abs_avg=18.127466201782227
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.5047959089279175, max_abs=5.0, mean_rel=0.14660775661468506, max_rel=1024.03564453125, norm_rel=0.023661864921450615, ref_abs_avg=21.38694190979004, test_abs_avg=21.387908935546875
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.47194093465805054, max_abs=4.75, mean_rel=0.1906648725271225, max_rel=1187.5, norm_rel=0.022209761664271355, ref_abs_avg=21.231689453125, test_abs_avg=21.234981536865234
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.38137364387512207, max_abs=1.546875, mean_rel=0.11678466200828552, max_rel=8.579320907592773, norm_rel=0.023638451471924782, ref_abs_avg=16.200519561767578, test_abs_avg=16.25222396850586
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4762158691883087, max_abs=4.25, mean_rel=0.14025171101093292, max_rel=476.4439697265625, norm_rel=0.023283423855900764, ref_abs_avg=20.529434204101562, test_abs_avg=20.53199005126953
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4343011677265167, max_abs=4.0, mean_rel=0.20446550846099854, max_rel=1203.125, norm_rel=0.02156515046954155, ref_abs_avg=20.168514251708984, test_abs_avg=20.174015045166016
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.36807894706726074, max_abs=1.625, mean_rel=0.09008835256099701, max_rel=3.725975275039673, norm_rel=0.02240125462412834, ref_abs_avg=16.63687515258789, test_abs_avg=16.65067481994629
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.4487206041812897, max_abs=4.5, mean_rel=0.13431459665298462, max_rel=750.751708984375, norm_rel=0.02279364876449108, ref_abs_avg=19.7703800201416, test_abs_avg=19.77227020263672
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4106912612915039, max_abs=4.25, mean_rel=0.1544228196144104, max_rel=765.6249389648438, norm_rel=0.020351916551589966, ref_abs_avg=20.136638641357422, test_abs_avg=20.141582489013672
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.35394930839538574, max_abs=1.25, mean_rel=0.1626071333885193, max_rel=44.47361755371094, norm_rel=0.021812178194522858, ref_abs_avg=16.61333465576172, test_abs_avg=16.59107780456543
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.42636996507644653, max_abs=6.0, mean_rel=0.1364874541759491, max_rel=721.0531005859375, norm_rel=0.022283200174570084, ref_abs_avg=19.270347595214844, test_abs_avg=19.271377563476562
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.388332724571228, max_abs=3.9375, mean_rel=0.17543581128120422, max_rel=1359.3748779296875, norm_rel=0.02037777379155159, ref_abs_avg=19.06947898864746, test_abs_avg=19.07135581970215
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.30880212783813477, max_abs=1.25, mean_rel=0.20463421940803528, max_rel=43.528629302978516, norm_rel=0.01933477073907852, ref_abs_avg=16.33472442626953, test_abs_avg=16.340473175048828
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.3973706364631653, max_abs=5.0, mean_rel=0.1316477507352829, max_rel=546.703857421875, norm_rel=0.021734310314059258, ref_abs_avg=18.52318572998047, test_abs_avg=18.524776458740234
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3683764636516571, max_abs=4.0, mean_rel=0.16205784678459167, max_rel=1374.9998779296875, norm_rel=0.02024981752038002, ref_abs_avg=18.496227264404297, test_abs_avg=18.493736267089844
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.3169426918029785, max_abs=1.25, mean_rel=0.10199765861034393, max_rel=6.43355655670166, norm_rel=0.020004544407129288, ref_abs_avg=15.836246490478516, test_abs_avg=15.815474510192871
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3855438530445099, max_abs=5.0, mean_rel=0.12998825311660767, max_rel=718.2959594726562, norm_rel=0.021374549716711044, ref_abs_avg=18.322147369384766, test_abs_avg=18.32375717163086
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.3420358896255493, max_abs=5.0, mean_rel=0.14521317183971405, max_rel=999.9999389648438, norm_rel=0.018999898806214333, ref_abs_avg=18.27035903930664, test_abs_avg=18.27201271057129
production_forward2 vs paper_forward output: mean_abs=0.0016872197156772017, max_abs=0.05078125
production_forward2 grad[0] vs paper_forward: mean_abs=0.008551599457859993, max_abs=0.5625, mean_rel=0.07273846864700317, max_rel=107.10929107666016, norm_rel=0.019876210018992424, ref_abs_avg=0.4674391746520996, test_abs_avg=0.46745866537094116
production_forward2 grad[1] vs paper_forward: mean_abs=7.474143981933594, max_abs=64.0, mean_rel=0.1737779825925827, max_rel=504.2858581542969, norm_rel=0.020953230559825897, ref_abs_avg=322.45733642578125, test_abs_avg=322.57861328125
production_forward2 grad[2] vs paper_forward: mean_abs=1.3420658111572266, max_abs=5.0, mean_rel=0.25467759370803833, max_rel=80.53822326660156, norm_rel=0.023841671645641327, ref_abs_avg=56.52017593383789, test_abs_avg=56.48738098144531
production_forward2 grad[3] vs paper_forward: mean_abs=1.6457786560058594, max_abs=11.0, mean_rel=0.1798223853111267, max_rel=4514.17919921875, norm_rel=0.024486646056175232, ref_abs_avg=67.65940856933594, test_abs_avg=67.67076110839844
production_forward2 grad[4] vs paper_forward: mean_abs=1.505303144454956, max_abs=9.0, mean_rel=0.4671463966369629, max_rel=4750.0, norm_rel=0.022871345281600952, ref_abs_avg=66.2699203491211, test_abs_avg=66.27244567871094
production_forward2 grad[5] vs paper_forward: mean_abs=1.1136054992675781, max_abs=4.5, mean_rel=0.07916216552257538, max_rel=5.741725921630859, norm_rel=0.02235371805727482, ref_abs_avg=51.672035217285156, test_abs_avg=51.63356399536133
production_forward2 grad[6] vs paper_forward: mean_abs=1.4368915557861328, max_abs=12.0, mean_rel=0.16988489031791687, max_rel=1496.5872802734375, norm_rel=0.024196049198508263, ref_abs_avg=59.84806823730469, test_abs_avg=59.85609436035156
production_forward2 grad[7] vs paper_forward: mean_abs=1.3311927318572998, max_abs=8.6875, mean_rel=0.44665777683258057, max_rel=4875.0, norm_rel=0.02277240715920925, ref_abs_avg=58.78927230834961, test_abs_avg=58.80188751220703
production_forward2 grad[8] vs paper_forward: mean_abs=0.9653756618499756, max_abs=4.0, mean_rel=0.09152165055274963, max_rel=6.889003753662109, norm_rel=0.023343486711382866, ref_abs_avg=42.16180419921875, test_abs_avg=42.20882797241211
production_forward2 grad[9] vs paper_forward: mean_abs=1.2969467639923096, max_abs=9.0, mean_rel=0.15823394060134888, max_rel=2696.779541015625, norm_rel=0.02398495562374592, ref_abs_avg=54.469337463378906, test_abs_avg=54.475929260253906
production_forward2 grad[10] vs paper_forward: mean_abs=1.1888290643692017, max_abs=7.5, mean_rel=0.3724581301212311, max_rel=3312.499755859375, norm_rel=0.022216137498617172, ref_abs_avg=53.84663391113281, test_abs_avg=53.85822677612305
production_forward2 grad[11] vs paper_forward: mean_abs=0.9061527252197266, max_abs=4.0, mean_rel=0.14103491604328156, max_rel=15.799010276794434, norm_rel=0.021554026752710342, ref_abs_avg=41.378841400146484, test_abs_avg=41.35517501831055
production_forward2 grad[12] vs paper_forward: mean_abs=1.202732801437378, max_abs=9.0, mean_rel=0.17177915573120117, max_rel=2159.06298828125, norm_rel=0.02380417101085186, ref_abs_avg=50.82490539550781, test_abs_avg=50.828277587890625
production_forward2 grad[13] vs paper_forward: mean_abs=1.1064610481262207, max_abs=7.5, mean_rel=0.29513779282569885, max_rel=2562.5, norm_rel=0.02200656570494175, ref_abs_avg=50.50739288330078, test_abs_avg=50.514862060546875
production_forward2 grad[14] vs paper_forward: mean_abs=0.8920211791992188, max_abs=3.125, mean_rel=0.0727154016494751, max_rel=6.376576900482178, norm_rel=0.021959960460662842, ref_abs_avg=41.30524444580078, test_abs_avg=41.29331970214844
production_forward2 grad[15] vs paper_forward: mean_abs=1.128957748413086, max_abs=8.0, mean_rel=0.16751982271671295, max_rel=3121.963134765625, norm_rel=0.023742860183119774, ref_abs_avg=47.83376693725586, test_abs_avg=47.83708190917969
production_forward2 grad[16] vs paper_forward: mean_abs=1.034358024597168, max_abs=6.25, mean_rel=0.3673507571220398, max_rel=4500.0, norm_rel=0.022004153579473495, ref_abs_avg=47.21449661254883, test_abs_avg=47.21709060668945
production_forward2 grad[17] vs paper_forward: mean_abs=0.8443450927734375, max_abs=3.0, mean_rel=0.07604789733886719, max_rel=4.132279396057129, norm_rel=0.022458327934145927, ref_abs_avg=37.06631851196289, test_abs_avg=37.020233154296875
production_forward2 grad[18] vs paper_forward: mean_abs=1.069697380065918, max_abs=7.0, mean_rel=0.15016043186187744, max_rel=1306.1986083984375, norm_rel=0.023577401414513588, ref_abs_avg=45.646175384521484, test_abs_avg=45.64996337890625
production_forward2 grad[19] vs paper_forward: mean_abs=0.9744071960449219, max_abs=6.625, mean_rel=0.3076031506061554, max_rel=2687.499755859375, norm_rel=0.02193848043680191, ref_abs_avg=44.634307861328125, test_abs_avg=44.63795852661133
production_forward2 grad[20] vs paper_forward: mean_abs=0.7489404678344727, max_abs=3.125, mean_rel=0.1016983836889267, max_rel=4.330184459686279, norm_rel=0.020960092544555664, ref_abs_avg=36.09153366088867, test_abs_avg=36.076873779296875
production_forward2 grad[21] vs paper_forward: mean_abs=1.0070570707321167, max_abs=7.0, mean_rel=0.15184970200061798, max_rel=2147.791259765625, norm_rel=0.023393943905830383, ref_abs_avg=43.35321807861328, test_abs_avg=43.357521057128906
production_forward2 grad[22] vs paper_forward: mean_abs=0.9267981052398682, max_abs=5.5, mean_rel=0.2834135890007019, max_rel=3312.499755859375, norm_rel=0.021840546280145645, ref_abs_avg=42.6602783203125, test_abs_avg=42.66387176513672
production_forward2 grad[23] vs paper_forward: mean_abs=0.7393093109130859, max_abs=3.15625, mean_rel=0.08367198705673218, max_rel=8.410897254943848, norm_rel=0.02224225178360939, ref_abs_avg=34.60549545288086, test_abs_avg=34.57575988769531
production_forward2 grad[24] vs paper_forward: mean_abs=0.9627800583839417, max_abs=7.0, mean_rel=0.14418098330497742, max_rel=866.320556640625, norm_rel=0.023451222106814384, ref_abs_avg=41.29863739013672, test_abs_avg=41.303077697753906
production_forward2 grad[25] vs paper_forward: mean_abs=0.88667893409729, max_abs=5.5, mean_rel=0.26691311597824097, max_rel=3624.999755859375, norm_rel=0.021697087213397026, ref_abs_avg=41.05302047729492, test_abs_avg=41.04827117919922
production_forward2 grad[26] vs paper_forward: mean_abs=0.8568909168243408, max_abs=3.25, mean_rel=0.17955173552036285, max_rel=40.4127082824707, norm_rel=0.024586042389273643, ref_abs_avg=35.17753219604492, test_abs_avg=35.15394973754883
production_forward2 grad[27] vs paper_forward: mean_abs=1.0935394763946533, max_abs=7.5, mean_rel=0.17454561591148376, max_rel=1484.7252197265625, norm_rel=0.025089876726269722, ref_abs_avg=43.8392448425293, test_abs_avg=43.84476852416992
production_forward2 grad[28] vs paper_forward: mean_abs=1.0180130004882812, max_abs=6.0, mean_rel=0.3981204330921173, max_rel=3328.124755859375, norm_rel=0.02358252927660942, ref_abs_avg=43.37293243408203, test_abs_avg=43.37947082519531
production_forward2 grad[29] vs paper_forward: mean_abs=0.8301988840103149, max_abs=3.25, mean_rel=0.4584965705871582, max_rel=142.06088256835938, norm_rel=0.024882972240447998, ref_abs_avg=32.582542419433594, test_abs_avg=32.608917236328125
production_forward2 grad[30] vs paper_forward: mean_abs=1.0203545093536377, max_abs=7.0, mean_rel=0.16266818344593048, max_rel=1339.7972412109375, norm_rel=0.025309044867753983, ref_abs_avg=40.524478912353516, test_abs_avg=40.53015899658203
production_forward2 grad[31] vs paper_forward: mean_abs=0.9560270309448242, max_abs=6.25, mean_rel=0.32149070501327515, max_rel=3124.999755859375, norm_rel=0.023993708193302155, ref_abs_avg=39.95964050292969, test_abs_avg=39.961509704589844
production_forward2 grad[32] vs paper_forward: mean_abs=0.7273745536804199, max_abs=3.5, mean_rel=0.10161270201206207, max_rel=19.764142990112305, norm_rel=0.024243805557489395, ref_abs_avg=30.84677505493164, test_abs_avg=30.844520568847656
production_forward2 grad[33] vs paper_forward: mean_abs=0.9519269466400146, max_abs=6.5, mean_rel=0.16984492540359497, max_rel=1159.0625, norm_rel=0.02542865090072155, ref_abs_avg=37.64191818237305, test_abs_avg=37.64471435546875
production_forward2 grad[34] vs paper_forward: mean_abs=0.8918027281761169, max_abs=5.5, mean_rel=0.279116153717041, max_rel=2531.25, norm_rel=0.02402743138372898, ref_abs_avg=37.26817321777344, test_abs_avg=37.27332305908203
production_forward2 grad[35] vs paper_forward: mean_abs=0.6918735504150391, max_abs=2.5, mean_rel=0.12169970571994781, max_rel=17.082374572753906, norm_rel=0.02347376197576523, ref_abs_avg=30.152812957763672, test_abs_avg=30.151004791259766
production_forward2 grad[36] vs paper_forward: mean_abs=0.9052906036376953, max_abs=6.5, mean_rel=0.15354010462760925, max_rel=963.443603515625, norm_rel=0.025111591443419456, ref_abs_avg=36.175132751464844, test_abs_avg=36.18016815185547
production_forward2 grad[37] vs paper_forward: mean_abs=0.8412792682647705, max_abs=5.25, mean_rel=0.3056311011314392, max_rel=2906.249755859375, norm_rel=0.023862840607762337, ref_abs_avg=35.35529327392578, test_abs_avg=35.35234451293945
production_forward2 grad[38] vs paper_forward: mean_abs=0.6737604141235352, max_abs=3.5, mean_rel=0.10972357541322708, max_rel=19.020904541015625, norm_rel=0.025171751156449318, ref_abs_avg=27.759523391723633, test_abs_avg=27.751575469970703
production_forward2 grad[39] vs paper_forward: mean_abs=0.8520711660385132, max_abs=5.5, mean_rel=0.1726064383983612, max_rel=1181.1961669921875, norm_rel=0.024906625971198082, ref_abs_avg=34.28187942504883, test_abs_avg=34.284645080566406
production_forward2 grad[40] vs paper_forward: mean_abs=0.7905338406562805, max_abs=4.75, mean_rel=0.32561126351356506, max_rel=2312.5, norm_rel=0.023410949856042862, ref_abs_avg=33.829437255859375, test_abs_avg=33.83741760253906
production_forward2 grad[41] vs paper_forward: mean_abs=0.6450376510620117, max_abs=2.59375, mean_rel=0.07118498533964157, max_rel=5.697805404663086, norm_rel=0.022955888882279396, ref_abs_avg=28.64440155029297, test_abs_avg=28.674720764160156
production_forward2 grad[42] vs paper_forward: mean_abs=0.8059067726135254, max_abs=5.0, mean_rel=0.16593816876411438, max_rel=1219.42626953125, norm_rel=0.024747470393776894, ref_abs_avg=32.640724182128906, test_abs_avg=32.64277648925781
production_forward2 grad[43] vs paper_forward: mean_abs=0.7510231137275696, max_abs=5.125, mean_rel=0.26123058795928955, max_rel=2187.5, norm_rel=0.02326369658112526, ref_abs_avg=32.32013702392578, test_abs_avg=32.320960998535156
production_forward2 grad[44] vs paper_forward: mean_abs=0.5742101669311523, max_abs=2.3125, mean_rel=0.26748213171958923, max_rel=49.920127868652344, norm_rel=0.022840505465865135, ref_abs_avg=25.098264694213867, test_abs_avg=25.096752166748047
production_forward2 grad[45] vs paper_forward: mean_abs=0.7697930335998535, max_abs=7.0, mean_rel=0.16364622116088867, max_rel=1067.6025390625, norm_rel=0.02446787618100643, ref_abs_avg=31.541820526123047, test_abs_avg=31.545408248901367
production_forward2 grad[46] vs paper_forward: mean_abs=0.7159220576286316, max_abs=5.0, mean_rel=0.27761608362197876, max_rel=2406.25, norm_rel=0.022858932614326477, ref_abs_avg=31.29827117919922, test_abs_avg=31.300458908081055
production_forward2 grad[47] vs paper_forward: mean_abs=0.5608310699462891, max_abs=2.21875, mean_rel=0.11113782227039337, max_rel=8.630021095275879, norm_rel=0.02235354669392109, ref_abs_avg=24.916790008544922, test_abs_avg=24.90578842163086
production_forward2 grad[48] vs paper_forward: mean_abs=0.7385985851287842, max_abs=6.0, mean_rel=0.16628201305866241, max_rel=937.790283203125, norm_rel=0.024302834644913673, ref_abs_avg=30.514930725097656, test_abs_avg=30.517047882080078
production_forward2 grad[49] vs paper_forward: mean_abs=0.6830429434776306, max_abs=5.0, mean_rel=0.22722077369689941, max_rel=1656.2498779296875, norm_rel=0.022349320352077484, ref_abs_avg=30.528940200805664, test_abs_avg=30.532907485961914
production_forward2 grad[50] vs paper_forward: mean_abs=0.6736202239990234, max_abs=2.75, mean_rel=0.08827956020832062, max_rel=4.394557476043701, norm_rel=0.024744898080825806, ref_abs_avg=26.441181182861328, test_abs_avg=26.535572052001953
production_forward2 grad[51] vs paper_forward: mean_abs=0.8388605117797852, max_abs=6.25, mean_rel=0.16807690262794495, max_rel=911.7340698242188, norm_rel=0.025675605982542038, ref_abs_avg=32.79296112060547, test_abs_avg=32.79286575317383
production_forward2 grad[52] vs paper_forward: mean_abs=0.7784501314163208, max_abs=5.375, mean_rel=0.2911250591278076, max_rel=2125.0, norm_rel=0.024600215256214142, ref_abs_avg=31.69656753540039, test_abs_avg=31.688190460205078
production_forward2 grad[53] vs paper_forward: mean_abs=0.5857810974121094, max_abs=2.0, mean_rel=0.11018098890781403, max_rel=7.639241695404053, norm_rel=0.02451466955244541, ref_abs_avg=23.763731002807617, test_abs_avg=23.748353958129883
production_forward2 grad[54] vs paper_forward: mean_abs=0.7628574371337891, max_abs=6.0, mean_rel=0.169779971241951, max_rel=1189.89404296875, norm_rel=0.025350559502840042, ref_abs_avg=30.182022094726562, test_abs_avg=30.183229446411133
production_forward2 grad[55] vs paper_forward: mean_abs=0.7055602073669434, max_abs=4.25, mean_rel=0.27687400579452515, max_rel=2203.125, norm_rel=0.023760676383972168, ref_abs_avg=29.71929168701172, test_abs_avg=29.723670959472656
production_forward2 grad[56] vs paper_forward: mean_abs=0.5450713634490967, max_abs=2.25, mean_rel=0.09172242879867554, max_rel=7.530482292175293, norm_rel=0.022699572145938873, ref_abs_avg=24.213947296142578, test_abs_avg=24.183673858642578
production_forward2 grad[57] vs paper_forward: mean_abs=0.7083698511123657, max_abs=6.0, mean_rel=0.1578013002872467, max_rel=1412.008056640625, norm_rel=0.024773171171545982, ref_abs_avg=28.638219833374023, test_abs_avg=28.64082908630371
production_forward2 grad[58] vs paper_forward: mean_abs=0.6511149406433105, max_abs=5.3125, mean_rel=0.2808467745780945, max_rel=2874.999755859375, norm_rel=0.023131215944886208, ref_abs_avg=28.098445892333984, test_abs_avg=28.092506408691406
production_forward2 grad[59] vs paper_forward: mean_abs=0.5373592376708984, max_abs=2.25, mean_rel=0.12976893782615662, max_rel=25.936038970947266, norm_rel=0.024710139259696007, ref_abs_avg=21.586933135986328, test_abs_avg=21.58340072631836
production_forward2 grad[60] vs paper_forward: mean_abs=0.6591979265213013, max_abs=5.25, mean_rel=0.15788626670837402, max_rel=1069.58349609375, norm_rel=0.024516670033335686, ref_abs_avg=26.92159652709961, test_abs_avg=26.921266555786133
production_forward2 grad[61] vs paper_forward: mean_abs=0.610370934009552, max_abs=4.5, mean_rel=0.24591809511184692, max_rel=1562.4998779296875, norm_rel=0.02304510399699211, ref_abs_avg=26.438827514648438, test_abs_avg=26.433921813964844
production_forward2 grad[62] vs paper_forward: mean_abs=0.4959786534309387, max_abs=2.0, mean_rel=0.25389644503593445, max_rel=70.07051086425781, norm_rel=0.022942831739783287, ref_abs_avg=20.833148956298828, test_abs_avg=20.868335723876953
production_forward2 grad[63] vs paper_forward: mean_abs=0.6206351518630981, max_abs=5.5, mean_rel=0.1534235179424286, max_rel=1372.43505859375, norm_rel=0.023971037939190865, ref_abs_avg=25.910940170288086, test_abs_avg=25.910259246826172
production_forward2 grad[64] vs paper_forward: mean_abs=0.5737791061401367, max_abs=4.375, mean_rel=0.22974181175231934, max_rel=1749.9998779296875, norm_rel=0.02262549102306366, ref_abs_avg=25.379470825195312, test_abs_avg=25.38043975830078
production_forward2 grad[65] vs paper_forward: mean_abs=0.45674800872802734, max_abs=1.75, mean_rel=0.11653141677379608, max_rel=17.018415451049805, norm_rel=0.02363068424165249, ref_abs_avg=19.992420196533203, test_abs_avg=19.966283798217773
production_forward2 grad[66] vs paper_forward: mean_abs=0.5809087753295898, max_abs=5.0, mean_rel=0.1493690311908722, max_rel=781.0452270507812, norm_rel=0.023827064782381058, ref_abs_avg=24.39061737060547, test_abs_avg=24.390623092651367
production_forward2 grad[67] vs paper_forward: mean_abs=0.5449066162109375, max_abs=4.75, mean_rel=0.2260369211435318, max_rel=1593.7498779296875, norm_rel=0.022372549399733543, ref_abs_avg=24.28000259399414, test_abs_avg=24.276248931884766
production_forward2 grad[68] vs paper_forward: mean_abs=0.4406163692474365, max_abs=1.65625, mean_rel=0.2409728765487671, max_rel=26.738494873046875, norm_rel=0.022724375128746033, ref_abs_avg=19.585926055908203, test_abs_avg=19.589019775390625
production_forward2 grad[69] vs paper_forward: mean_abs=0.5572205781936646, max_abs=4.625, mean_rel=0.14412575960159302, max_rel=612.479736328125, norm_rel=0.02331225387752056, ref_abs_avg=23.9407901763916, test_abs_avg=23.9403076171875
production_forward2 grad[70] vs paper_forward: mean_abs=0.5169080495834351, max_abs=3.5625, mean_rel=0.2532881498336792, max_rel=1843.7498779296875, norm_rel=0.02203490026295185, ref_abs_avg=23.451183319091797, test_abs_avg=23.461347579956055
production_forward2 grad[71] vs paper_forward: mean_abs=0.39270472526550293, max_abs=1.5, mean_rel=0.11433757841587067, max_rel=13.068227767944336, norm_rel=0.020785655826330185, ref_abs_avg=19.17987823486328, test_abs_avg=19.165800094604492
production_forward2 grad[72] vs paper_forward: mean_abs=0.5313392877578735, max_abs=4.0, mean_rel=0.15325477719306946, max_rel=1282.7384033203125, norm_rel=0.022871108725667, ref_abs_avg=23.254371643066406, test_abs_avg=23.25584602355957
production_forward2 grad[73] vs paper_forward: mean_abs=0.49236515164375305, max_abs=4.0, mean_rel=0.21280132234096527, max_rel=1718.7498779296875, norm_rel=0.021631207317113876, ref_abs_avg=22.80669403076172, test_abs_avg=22.811992645263672
production_forward2 grad[74] vs paper_forward: mean_abs=0.45283234119415283, max_abs=2.25, mean_rel=0.07734499126672745, max_rel=3.119025945663452, norm_rel=0.022059371694922447, ref_abs_avg=20.350566864013672, test_abs_avg=20.394933700561523
production_forward2 grad[75] vs paper_forward: mean_abs=0.5854726433753967, max_abs=4.78125, mean_rel=0.15383246541023254, max_rel=806.818115234375, norm_rel=0.024721762165427208, ref_abs_avg=23.71868133544922, test_abs_avg=23.720386505126953
production_forward2 grad[76] vs paper_forward: mean_abs=0.535799503326416, max_abs=4.5, mean_rel=0.2808782458305359, max_rel=1687.4998779296875, norm_rel=0.02308559976518154, ref_abs_avg=23.161487579345703, test_abs_avg=23.16394805908203
production_forward2 grad[77] vs paper_forward: mean_abs=0.4333444833755493, max_abs=1.76171875, mean_rel=0.15952828526496887, max_rel=24.61172103881836, norm_rel=0.022734511643648148, ref_abs_avg=18.934097290039062, test_abs_avg=18.932220458984375
production_forward2 grad[78] vs paper_forward: mean_abs=0.545069694519043, max_abs=4.5, mean_rel=0.15905526280403137, max_rel=1601.5802001953125, norm_rel=0.023945234715938568, ref_abs_avg=22.76857566833496, test_abs_avg=22.77061653137207
production_forward2 grad[79] vs paper_forward: mean_abs=0.500862181186676, max_abs=3.8125, mean_rel=0.2382848858833313, max_rel=1906.2498779296875, norm_rel=0.022901788353919983, ref_abs_avg=21.842647552490234, test_abs_avg=21.839797973632812
production_forward2 grad[80] vs paper_forward: mean_abs=0.367276668548584, max_abs=1.25390625, mean_rel=0.0758374035358429, max_rel=2.2079050540924072, norm_rel=0.020170288160443306, ref_abs_avg=18.13812828063965, test_abs_avg=18.12120819091797
production_forward2 grad[81] vs paper_forward: mean_abs=0.500059962272644, max_abs=4.5, mean_rel=0.14143186807632446, max_rel=1066.1151123046875, norm_rel=0.023456530645489693, ref_abs_avg=21.38694190979004, test_abs_avg=21.38799476623535
production_forward2 grad[82] vs paper_forward: mean_abs=0.466718852519989, max_abs=4.0, mean_rel=0.20084601640701294, max_rel=1624.9998779296875, norm_rel=0.02193378657102585, ref_abs_avg=21.231689453125, test_abs_avg=21.229454040527344
production_forward2 grad[83] vs paper_forward: mean_abs=0.3703620433807373, max_abs=1.375, mean_rel=0.13576382398605347, max_rel=16.58399772644043, norm_rel=0.022623756900429726, ref_abs_avg=16.200519561767578, test_abs_avg=16.23784637451172
production_forward2 grad[84] vs paper_forward: mean_abs=0.47242268919944763, max_abs=4.5, mean_rel=0.13808229565620422, max_rel=540.783203125, norm_rel=0.023113712668418884, ref_abs_avg=20.529434204101562, test_abs_avg=20.531858444213867
production_forward2 grad[85] vs paper_forward: mean_abs=0.43151700496673584, max_abs=4.5, mean_rel=0.21227584779262543, max_rel=1359.3748779296875, norm_rel=0.021457206457853317, ref_abs_avg=20.168514251708984, test_abs_avg=20.174163818359375
production_forward2 grad[86] vs paper_forward: mean_abs=0.3715047836303711, max_abs=1.5, mean_rel=0.09674204140901566, max_rel=6.6569437980651855, norm_rel=0.02257411740720272, ref_abs_avg=16.63687515258789, test_abs_avg=16.64946937561035
production_forward2 grad[87] vs paper_forward: mean_abs=0.4455168843269348, max_abs=4.5, mean_rel=0.134426087141037, max_rel=590.7275390625, norm_rel=0.02266121841967106, ref_abs_avg=19.7703800201416, test_abs_avg=19.77223014831543
production_forward2 grad[88] vs paper_forward: mean_abs=0.4070540964603424, max_abs=3.875, mean_rel=0.16276055574417114, max_rel=1187.5, norm_rel=0.02017282135784626, ref_abs_avg=20.136638641357422, test_abs_avg=20.14139175415039
production_forward2 grad[89] vs paper_forward: mean_abs=0.3538849353790283, max_abs=1.5, mean_rel=0.12747260928153992, max_rel=27.067031860351562, norm_rel=0.021419961005449295, ref_abs_avg=16.61333465576172, test_abs_avg=16.585054397583008
production_forward2 grad[90] vs paper_forward: mean_abs=0.4242118000984192, max_abs=6.0, mean_rel=0.13460347056388855, max_rel=659.8638305664062, norm_rel=0.02217642404139042, ref_abs_avg=19.270347595214844, test_abs_avg=19.271671295166016
production_forward2 grad[91] vs paper_forward: mean_abs=0.38681063055992126, max_abs=3.5, mean_rel=0.1768437623977661, max_rel=1296.8748779296875, norm_rel=0.02031945437192917, ref_abs_avg=19.06947898864746, test_abs_avg=19.075239181518555
production_forward2 grad[92] vs paper_forward: mean_abs=0.30743932723999023, max_abs=1.375, mean_rel=0.1654708832502365, max_rel=35.979679107666016, norm_rel=0.019397098571062088, ref_abs_avg=16.33472442626953, test_abs_avg=16.340938568115234
production_forward2 grad[93] vs paper_forward: mean_abs=0.3955245614051819, max_abs=5.0, mean_rel=0.1280735433101654, max_rel=509.2669372558594, norm_rel=0.02162330411374569, ref_abs_avg=18.52318572998047, test_abs_avg=18.5244140625
production_forward2 grad[94] vs paper_forward: mean_abs=0.36666691303253174, max_abs=4.0, mean_rel=0.16394636034965515, max_rel=1437.4998779296875, norm_rel=0.020093901082873344, ref_abs_avg=18.496227264404297, test_abs_avg=18.49152183532715
production_forward2 grad[95] vs paper_forward: mean_abs=0.31070947647094727, max_abs=1.25, mean_rel=0.10545015335083008, max_rel=7.560337543487549, norm_rel=0.019940821453928947, ref_abs_avg=15.836246490478516, test_abs_avg=15.833780288696289
production_forward2 grad[96] vs paper_forward: mean_abs=0.38468119502067566, max_abs=5.0, mean_rel=0.1336909830570221, max_rel=653.0397338867188, norm_rel=0.02132991887629032, ref_abs_avg=18.322147369384766, test_abs_avg=18.32379150390625
production_forward2 grad[97] vs paper_forward: mean_abs=0.33920252323150635, max_abs=4.453125, mean_rel=0.1496196687221527, max_rel=1390.6248779296875, norm_rel=0.018795862793922424, ref_abs_avg=18.27035903930664, test_abs_avg=18.27344512939453
identity layers + randn queries
torch_compile_phases_forward fwd+bwd:  165.902 ms
torch_compile_phases_forward bwd-only: 132.495 ms
torch_compile_phases_forward peak allocated: fwd=12.781 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.078 GiB, fwd+bwd=17.330 GiB
production_forward fwd+bwd:  116.369 ms
production_forward bwd-only: 95.950 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.303 GiB, fwd+bwd=10.303 GiB
production_forward2 fwd+bwd:  113.413 ms
production_forward2 bwd-only: 95.772 ms
production_forward2 peak allocated: fwd=3.071 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=3.303 GiB, fwd+bwd=11.303 GiB
paper_forward fwd+bwd:  382.326 ms
paper_forward bwd-only: 301.990 ms
paper_forward peak allocated: fwd=29.706 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.723 GiB, fwd+bwd=32.473 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016092355363070965, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.007971163839101791, max_abs=0.4765625, mean_rel=0.07055492699146271, max_rel=192.57015991210938, norm_rel=0.019276561215519905, ref_abs_avg=0.4473993182182312, test_abs_avg=0.44740986824035645
production_forward grad[1] vs paper_forward: mean_abs=6.95994234085083, max_abs=56.0, mean_rel=0.1264503449201584, max_rel=102.41001892089844, norm_rel=0.019364234060049057, ref_abs_avg=314.9085693359375, test_abs_avg=314.99017333984375
production_forward grad[2] vs paper_forward: mean_abs=1.1915178298950195, max_abs=4.5, mean_rel=0.11964760720729828, max_rel=13.830421447753906, norm_rel=0.02157379873096943, ref_abs_avg=53.2126579284668, test_abs_avg=53.16304016113281
production_forward grad[3] vs paper_forward: mean_abs=1.51218843460083, max_abs=10.0, mean_rel=0.17015504837036133, max_rel=1806.992431640625, norm_rel=0.024055011570453644, ref_abs_avg=63.250709533691406, test_abs_avg=63.261314392089844
production_forward grad[4] vs paper_forward: mean_abs=1.3958995342254639, max_abs=9.875, mean_rel=0.4167911410331726, max_rel=4406.25, norm_rel=0.02239380218088627, ref_abs_avg=62.72334289550781, test_abs_avg=62.72997283935547
production_forward grad[5] vs paper_forward: mean_abs=0.9609355926513672, max_abs=3.75, mean_rel=0.07887589931488037, max_rel=8.042097091674805, norm_rel=0.020031094551086426, ref_abs_avg=48.690643310546875, test_abs_avg=48.67787551879883
production_forward grad[6] vs paper_forward: mean_abs=1.318871259689331, max_abs=9.0, mean_rel=0.15899109840393066, max_rel=1371.826904296875, norm_rel=0.02378724329173565, ref_abs_avg=55.86088562011719, test_abs_avg=55.868202209472656
production_forward grad[7] vs paper_forward: mean_abs=1.2084522247314453, max_abs=8.25, mean_rel=0.30919480323791504, max_rel=4093.749755859375, norm_rel=0.022050509229302406, ref_abs_avg=55.14514923095703, test_abs_avg=55.151893615722656
production_forward grad[8] vs paper_forward: mean_abs=0.9695885181427002, max_abs=4.0, mean_rel=0.1448792815208435, max_rel=15.552237510681152, norm_rel=0.022698380053043365, ref_abs_avg=42.50764465332031, test_abs_avg=42.562870025634766
production_forward grad[9] vs paper_forward: mean_abs=1.2149603366851807, max_abs=8.0, mean_rel=0.16080713272094727, max_rel=2889.55810546875, norm_rel=0.02367720752954483, ref_abs_avg=51.68815612792969, test_abs_avg=51.68901824951172
production_forward grad[10] vs paper_forward: mean_abs=1.1102222204208374, max_abs=6.875, mean_rel=0.3073762059211731, max_rel=2687.499755859375, norm_rel=0.02188843861222267, ref_abs_avg=50.9837646484375, test_abs_avg=50.98223114013672
production_forward grad[11] vs paper_forward: mean_abs=0.9114522933959961, max_abs=3.75, mean_rel=0.0890994668006897, max_rel=5.128869533538818, norm_rel=0.02317184954881668, ref_abs_avg=39.32476806640625, test_abs_avg=39.315452575683594
production_forward grad[12] vs paper_forward: mean_abs=1.1134717464447021, max_abs=7.0, mean_rel=0.15187667310237885, max_rel=1469.2237548828125, norm_rel=0.023359347134828568, ref_abs_avg=47.99427032470703, test_abs_avg=47.998592376708984
production_forward grad[13] vs paper_forward: mean_abs=1.023221731185913, max_abs=5.875, mean_rel=0.3197785019874573, max_rel=2562.5, norm_rel=0.021654712036252022, ref_abs_avg=47.5930290222168, test_abs_avg=47.595924377441406
production_forward grad[14] vs paper_forward: mean_abs=0.8261470794677734, max_abs=2.96875, mean_rel=0.0787573754787445, max_rel=2.9278037548065186, norm_rel=0.021891098469495773, ref_abs_avg=38.134708404541016, test_abs_avg=38.18557357788086
production_forward grad[15] vs paper_forward: mean_abs=1.0434328317642212, max_abs=7.0, mean_rel=0.15630190074443817, max_rel=1112.3756103515625, norm_rel=0.023293767124414444, ref_abs_avg=45.09787368774414, test_abs_avg=45.101531982421875
production_forward grad[16] vs paper_forward: mean_abs=0.9549486041069031, max_abs=5.875, mean_rel=0.3113662898540497, max_rel=3374.999755859375, norm_rel=0.02139405533671379, ref_abs_avg=44.8675651550293, test_abs_avg=44.87144470214844
production_forward grad[17] vs paper_forward: mean_abs=0.8066422939300537, max_abs=3.0, mean_rel=0.12833641469478607, max_rel=6.352704048156738, norm_rel=0.023813797160983086, ref_abs_avg=32.931854248046875, test_abs_avg=32.95733642578125
production_forward grad[18] vs paper_forward: mean_abs=0.9884559512138367, max_abs=7.5, mean_rel=0.14426550269126892, max_rel=1344.5523681640625, norm_rel=0.023196537047624588, ref_abs_avg=42.893348693847656, test_abs_avg=42.89463806152344
production_forward grad[19] vs paper_forward: mean_abs=0.9055849313735962, max_abs=5.25, mean_rel=0.33344003558158875, max_rel=3187.499755859375, norm_rel=0.021546470001339912, ref_abs_avg=42.25879669189453, test_abs_avg=42.263999938964844
production_forward grad[20] vs paper_forward: mean_abs=0.7079133987426758, max_abs=2.75, mean_rel=0.10764683783054352, max_rel=12.152777671813965, norm_rel=0.02100117690861225, ref_abs_avg=34.0833854675293, test_abs_avg=34.044830322265625
production_forward grad[21] vs paper_forward: mean_abs=0.9357087016105652, max_abs=6.5, mean_rel=0.15282970666885376, max_rel=1939.208251953125, norm_rel=0.023107275366783142, ref_abs_avg=40.76611328125, test_abs_avg=40.770301818847656
production_forward grad[22] vs paper_forward: mean_abs=0.8596898317337036, max_abs=5.5, mean_rel=0.2913442850112915, max_rel=2562.5, norm_rel=0.0215216726064682, ref_abs_avg=40.17995834350586, test_abs_avg=40.17676544189453
production_forward grad[23] vs paper_forward: mean_abs=0.677777886390686, max_abs=2.78125, mean_rel=0.17813047766685486, max_rel=44.05498504638672, norm_rel=0.02269512601196766, ref_abs_avg=30.373912811279297, test_abs_avg=30.40947151184082
production_forward grad[24] vs paper_forward: mean_abs=0.8885911703109741, max_abs=6.0, mean_rel=0.136476069688797, max_rel=1336.8367919921875, norm_rel=0.022996414452791214, ref_abs_avg=38.88275909423828, test_abs_avg=38.8861083984375
production_forward grad[25] vs paper_forward: mean_abs=0.8176523447036743, max_abs=5.25, mean_rel=0.2549630403518677, max_rel=2624.999755859375, norm_rel=0.021299609914422035, ref_abs_avg=38.57349395751953, test_abs_avg=38.57208251953125
production_forward grad[26] vs paper_forward: mean_abs=0.7982873916625977, max_abs=3.6875, mean_rel=0.12357820570468903, max_rel=13.689048767089844, norm_rel=0.023302968591451645, ref_abs_avg=34.018409729003906, test_abs_avg=34.10856628417969
production_forward grad[27] vs paper_forward: mean_abs=1.0265858173370361, max_abs=8.0, mean_rel=0.16137190163135529, max_rel=2671.455810546875, norm_rel=0.02472255565226078, ref_abs_avg=41.744056701660156, test_abs_avg=41.74654769897461
production_forward grad[28] vs paper_forward: mean_abs=0.9495595693588257, max_abs=6.0, mean_rel=0.31867605447769165, max_rel=2812.499755859375, norm_rel=0.023409822955727577, ref_abs_avg=40.784915924072266, test_abs_avg=40.78733825683594
production_forward grad[29] vs paper_forward: mean_abs=0.6654243469238281, max_abs=2.625, mean_rel=0.10148084908723831, max_rel=17.097623825073242, norm_rel=0.02201872132718563, ref_abs_avg=30.983116149902344, test_abs_avg=30.967876434326172
production_forward grad[30] vs paper_forward: mean_abs=0.9501780271530151, max_abs=6.0, mean_rel=0.15970909595489502, max_rel=1020.16845703125, norm_rel=0.025140181183815002, ref_abs_avg=38.00835418701172, test_abs_avg=38.010826110839844
production_forward grad[31] vs paper_forward: mean_abs=0.8834102749824524, max_abs=5.21875, mean_rel=0.3154401183128357, max_rel=2593.749755859375, norm_rel=0.023636143654584885, ref_abs_avg=37.475345611572266, test_abs_avg=37.47502136230469
production_forward grad[32] vs paper_forward: mean_abs=0.6873229146003723, max_abs=2.5, mean_rel=0.0921289324760437, max_rel=4.288970470428467, norm_rel=0.023398593068122864, ref_abs_avg=29.158878326416016, test_abs_avg=29.140731811523438
production_forward grad[33] vs paper_forward: mean_abs=0.8910493850708008, max_abs=7.0, mean_rel=0.16208377480506897, max_rel=2007.0423583984375, norm_rel=0.025063328444957733, ref_abs_avg=35.724815368652344, test_abs_avg=35.72737503051758
production_forward grad[34] vs paper_forward: mean_abs=0.8299530744552612, max_abs=5.125, mean_rel=0.30357372760772705, max_rel=3062.499755859375, norm_rel=0.02363666333258152, ref_abs_avg=35.24349594116211, test_abs_avg=35.25129699707031
production_forward grad[35] vs paper_forward: mean_abs=0.6599171161651611, max_abs=2.5, mean_rel=0.2849936783313751, max_rel=63.034088134765625, norm_rel=0.02570251002907753, ref_abs_avg=25.96175765991211, test_abs_avg=25.953571319580078
production_forward grad[36] vs paper_forward: mean_abs=0.8390803933143616, max_abs=5.5, mean_rel=0.16254594922065735, max_rel=905.0597534179688, norm_rel=0.024939745664596558, ref_abs_avg=33.78959655761719, test_abs_avg=33.793556213378906
production_forward grad[37] vs paper_forward: mean_abs=0.7751044034957886, max_abs=5.5, mean_rel=0.264453649520874, max_rel=2656.249755859375, norm_rel=0.02326277457177639, ref_abs_avg=33.39024353027344, test_abs_avg=33.393978118896484
production_forward grad[38] vs paper_forward: mean_abs=0.6250369548797607, max_abs=2.5, mean_rel=0.33295297622680664, max_rel=104.87358093261719, norm_rel=0.02371005155146122, ref_abs_avg=26.99566078186035, test_abs_avg=27.026615142822266
production_forward grad[39] vs paper_forward: mean_abs=0.7915717363357544, max_abs=6.0, mean_rel=0.17152482271194458, max_rel=1809.5540771484375, norm_rel=0.02446706034243107, ref_abs_avg=32.49394607543945, test_abs_avg=32.493194580078125
production_forward grad[40] vs paper_forward: mean_abs=0.7327830791473389, max_abs=4.6875, mean_rel=0.23540902137756348, max_rel=2125.0, norm_rel=0.0232014711946249, ref_abs_avg=31.66717529296875, test_abs_avg=31.663177490234375
production_forward grad[41] vs paper_forward: mean_abs=0.5815401077270508, max_abs=2.125, mean_rel=0.09836018085479736, max_rel=8.213109016418457, norm_rel=0.02260088548064232, ref_abs_avg=25.971385955810547, test_abs_avg=26.005319595336914
production_forward grad[42] vs paper_forward: mean_abs=0.7451515197753906, max_abs=5.0, mean_rel=0.15504783391952515, max_rel=1454.02783203125, norm_rel=0.02435941994190216, ref_abs_avg=30.73038101196289, test_abs_avg=30.73143196105957
production_forward grad[43] vs paper_forward: mean_abs=0.6994441747665405, max_abs=4.25, mean_rel=0.26694920659065247, max_rel=2312.5, norm_rel=0.0230089258402586, ref_abs_avg=30.47168731689453, test_abs_avg=30.47878646850586
production_forward grad[44] vs paper_forward: mean_abs=0.5448967814445496, max_abs=3.25, mean_rel=0.12054488807916641, max_rel=13.119872093200684, norm_rel=0.02319372072815895, ref_abs_avg=24.20584487915039, test_abs_avg=24.188495635986328
production_forward grad[45] vs paper_forward: mean_abs=0.7148041725158691, max_abs=5.0, mean_rel=0.1530895084142685, max_rel=1836.5533447265625, norm_rel=0.02422626130282879, ref_abs_avg=29.643056869506836, test_abs_avg=29.645980834960938
production_forward grad[46] vs paper_forward: mean_abs=0.664910078048706, max_abs=4.3125, mean_rel=0.25530487298965454, max_rel=1734.3748779296875, norm_rel=0.022769156843423843, ref_abs_avg=29.314102172851562, test_abs_avg=29.31247329711914
production_forward grad[47] vs paper_forward: mean_abs=0.5133028030395508, max_abs=2.0, mean_rel=0.06880110502243042, max_rel=2.620873212814331, norm_rel=0.022559380158782005, ref_abs_avg=23.237648010253906, test_abs_avg=23.219966888427734
production_forward grad[48] vs paper_forward: mean_abs=0.6882851123809814, max_abs=5.0, mean_rel=0.15166646242141724, max_rel=860.6349487304688, norm_rel=0.02403740957379341, ref_abs_avg=28.746768951416016, test_abs_avg=28.7486572265625
production_forward grad[49] vs paper_forward: mean_abs=0.6369038820266724, max_abs=4.25, mean_rel=0.2322026789188385, max_rel=1953.1248779296875, norm_rel=0.022314956411719322, ref_abs_avg=28.5910587310791, test_abs_avg=28.590362548828125
production_forward grad[50] vs paper_forward: mean_abs=0.5485097169876099, max_abs=2.15625, mean_rel=0.13942626118659973, max_rel=30.397497177124023, norm_rel=0.024841150268912315, ref_abs_avg=22.66551399230957, test_abs_avg=22.635353088378906
production_forward grad[51] vs paper_forward: mean_abs=0.7466393709182739, max_abs=5.75, mean_rel=0.16022330522537231, max_rel=956.3894653320312, norm_rel=0.02527124434709549, ref_abs_avg=29.636146545410156, test_abs_avg=29.63681411743164
production_forward grad[52] vs paper_forward: mean_abs=0.692028284072876, max_abs=4.5, mean_rel=0.24968963861465454, max_rel=1999.9998779296875, norm_rel=0.023711657151579857, ref_abs_avg=29.26289176940918, test_abs_avg=29.265090942382812
production_forward grad[53] vs paper_forward: mean_abs=0.5419726371765137, max_abs=2.0, mean_rel=0.14410850405693054, max_rel=19.96592903137207, norm_rel=0.024095989763736725, ref_abs_avg=22.219139099121094, test_abs_avg=22.277015686035156
production_forward grad[54] vs paper_forward: mean_abs=0.6879128217697144, max_abs=4.5, mean_rel=0.17121174931526184, max_rel=1513.9661865234375, norm_rel=0.025002634152770042, ref_abs_avg=27.593563079833984, test_abs_avg=27.59238052368164
production_forward grad[55] vs paper_forward: mean_abs=0.6414936780929565, max_abs=4.5, mean_rel=0.29263705015182495, max_rel=2468.75, norm_rel=0.023651454597711563, ref_abs_avg=27.2285213470459, test_abs_avg=27.23308563232422
production_forward grad[56] vs paper_forward: mean_abs=0.49132537841796875, max_abs=2.25, mean_rel=0.11297635734081268, max_rel=12.693809509277344, norm_rel=0.022135592997074127, ref_abs_avg=21.958545684814453, test_abs_avg=21.939292907714844
production_forward grad[57] vs paper_forward: mean_abs=0.6461131572723389, max_abs=4.5, mean_rel=0.16488324105739594, max_rel=954.01904296875, norm_rel=0.02453557401895523, ref_abs_avg=26.36745262145996, test_abs_avg=26.366912841796875
production_forward grad[58] vs paper_forward: mean_abs=0.6004374027252197, max_abs=3.75, mean_rel=0.2736307680606842, max_rel=2656.249755859375, norm_rel=0.02305580861866474, ref_abs_avg=26.071117401123047, test_abs_avg=26.075775146484375
production_forward grad[59] vs paper_forward: mean_abs=0.4968385696411133, max_abs=1.78125, mean_rel=0.06810091435909271, max_rel=2.254194736480713, norm_rel=0.024995049461722374, ref_abs_avg=20.362323760986328, test_abs_avg=20.3835391998291
production_forward grad[60] vs paper_forward: mean_abs=0.6089609861373901, max_abs=5.0, mean_rel=0.15966203808784485, max_rel=1228.7672119140625, norm_rel=0.02422192320227623, ref_abs_avg=25.203224182128906, test_abs_avg=25.204166412353516
production_forward grad[61] vs paper_forward: mean_abs=0.5664370059967041, max_abs=3.5, mean_rel=0.2380768060684204, max_rel=1898.4373779296875, norm_rel=0.022365175187587738, ref_abs_avg=25.356918334960938, test_abs_avg=25.3582763671875
production_forward grad[62] vs paper_forward: mean_abs=0.43446874618530273, max_abs=1.75, mean_rel=0.10957174748182297, max_rel=26.669151306152344, norm_rel=0.022054927423596382, ref_abs_avg=20.063735961914062, test_abs_avg=20.06146812438965
production_forward grad[63] vs paper_forward: mean_abs=0.5849381685256958, max_abs=5.4375, mean_rel=0.15594696998596191, max_rel=1039.7362060546875, norm_rel=0.023560622707009315, ref_abs_avg=24.836318969726562, test_abs_avg=24.834163665771484
production_forward grad[64] vs paper_forward: mean_abs=0.5380040407180786, max_abs=3.84375, mean_rel=0.2162172794342041, max_rel=1999.9998779296875, norm_rel=0.022228822112083435, ref_abs_avg=24.186899185180664, test_abs_avg=24.18622398376465
production_forward grad[65] vs paper_forward: mean_abs=0.4121628403663635, max_abs=1.6875, mean_rel=0.32641541957855225, max_rel=118.98787689208984, norm_rel=0.02039211057126522, ref_abs_avg=20.521976470947266, test_abs_avg=20.460386276245117
production_forward grad[66] vs paper_forward: mean_abs=0.5541481375694275, max_abs=4.0, mean_rel=0.14688192307949066, max_rel=870.412353515625, norm_rel=0.023525241762399673, ref_abs_avg=23.60074234008789, test_abs_avg=23.601715087890625
production_forward grad[67] vs paper_forward: mean_abs=0.5141832232475281, max_abs=3.625, mean_rel=0.2132902294397354, max_rel=1343.7498779296875, norm_rel=0.022003693506121635, ref_abs_avg=23.368793487548828, test_abs_avg=23.36585235595703
production_forward grad[68] vs paper_forward: mean_abs=0.3952066898345947, max_abs=1.625, mean_rel=0.14978209137916565, max_rel=22.21766471862793, norm_rel=0.02061288245022297, ref_abs_avg=19.58149528503418, test_abs_avg=19.54160499572754
production_forward grad[69] vs paper_forward: mean_abs=0.5256715416908264, max_abs=4.0, mean_rel=0.14640313386917114, max_rel=653.61083984375, norm_rel=0.023196058347821236, ref_abs_avg=22.731292724609375, test_abs_avg=22.73199462890625
production_forward grad[70] vs paper_forward: mean_abs=0.4901716113090515, max_abs=4.5, mean_rel=0.233384907245636, max_rel=1874.9998779296875, norm_rel=0.0217482578009367, ref_abs_avg=22.597606658935547, test_abs_avg=22.59881591796875
production_forward grad[71] vs paper_forward: mean_abs=0.3918423652648926, max_abs=1.8125, mean_rel=0.13320192694664001, max_rel=16.73977279663086, norm_rel=0.02201785147190094, ref_abs_avg=18.31340217590332, test_abs_avg=18.3372802734375
production_forward grad[72] vs paper_forward: mean_abs=0.5071658492088318, max_abs=4.25, mean_rel=0.1493939310312271, max_rel=1389.4390869140625, norm_rel=0.02285759709775448, ref_abs_avg=22.219133377075195, test_abs_avg=22.219589233398438
production_forward grad[73] vs paper_forward: mean_abs=0.4720410108566284, max_abs=3.5, mean_rel=0.20424224436283112, max_rel=1125.0, norm_rel=0.02151854708790779, ref_abs_avg=21.963817596435547, test_abs_avg=21.973812103271484
production_forward grad[74] vs paper_forward: mean_abs=0.46495532989501953, max_abs=1.75, mean_rel=0.14420993626117706, max_rel=18.000877380371094, norm_rel=0.023633508011698723, ref_abs_avg=19.466127395629883, test_abs_avg=19.475162506103516
production_forward grad[75] vs paper_forward: mean_abs=0.5779284834861755, max_abs=5.0, mean_rel=0.15417766571044922, max_rel=651.0836791992188, norm_rel=0.024025367572903633, ref_abs_avg=24.101314544677734, test_abs_avg=24.100753784179688
production_forward grad[76] vs paper_forward: mean_abs=0.5356684923171997, max_abs=4.125, mean_rel=0.23901447653770447, max_rel=1437.4998779296875, norm_rel=0.0228184312582016, ref_abs_avg=23.581501007080078, test_abs_avg=23.58252716064453
production_forward grad[77] vs paper_forward: mean_abs=0.4239187240600586, max_abs=2.0625, mean_rel=0.10545726120471954, max_rel=9.985274314880371, norm_rel=0.023617740720510483, ref_abs_avg=17.751361846923828, test_abs_avg=17.76169776916504
production_forward grad[78] vs paper_forward: mean_abs=0.5261026620864868, max_abs=4.25, mean_rel=0.15195520222187042, max_rel=954.0471801757812, norm_rel=0.023483937606215477, ref_abs_avg=22.486791610717773, test_abs_avg=22.48755645751953
production_forward grad[79] vs paper_forward: mean_abs=0.489912748336792, max_abs=4.0, mean_rel=0.21654853224754333, max_rel=1281.25, norm_rel=0.02200867421925068, ref_abs_avg=22.341251373291016, test_abs_avg=22.33966827392578
production_forward grad[80] vs paper_forward: mean_abs=0.37285715341567993, max_abs=1.625, mean_rel=0.629192590713501, max_rel=273.3591003417969, norm_rel=0.02124684490263462, ref_abs_avg=17.880924224853516, test_abs_avg=17.891809463500977
production_forward grad[81] vs paper_forward: mean_abs=0.4896974563598633, max_abs=5.0, mean_rel=0.14852140843868256, max_rel=999.0321044921875, norm_rel=0.022881077602505684, ref_abs_avg=21.472576141357422, test_abs_avg=21.47152328491211
production_forward grad[82] vs paper_forward: mean_abs=0.444197416305542, max_abs=3.875, mean_rel=0.1753196120262146, max_rel=1343.7498779296875, norm_rel=0.021095946431159973, ref_abs_avg=21.047569274902344, test_abs_avg=21.04400634765625
production_forward grad[83] vs paper_forward: mean_abs=0.35048341751098633, max_abs=1.375, mean_rel=0.07673291862010956, max_rel=2.2230780124664307, norm_rel=0.02258709818124771, ref_abs_avg=15.498505592346191, test_abs_avg=15.461387634277344
production_forward grad[84] vs paper_forward: mean_abs=0.45590996742248535, max_abs=4.0, mean_rel=0.1383208930492401, max_rel=1273.8724365234375, norm_rel=0.022293563932180405, ref_abs_avg=20.542238235473633, test_abs_avg=20.5426025390625
production_forward grad[85] vs paper_forward: mean_abs=0.41605907678604126, max_abs=3.75, mean_rel=0.1809394210577011, max_rel=1273.4375, norm_rel=0.02055245079100132, ref_abs_avg=20.267139434814453, test_abs_avg=20.26219940185547
production_forward grad[86] vs paper_forward: mean_abs=0.3617713451385498, max_abs=1.25, mean_rel=0.19818046689033508, max_rel=40.98674011230469, norm_rel=0.021856291219592094, ref_abs_avg=16.077983856201172, test_abs_avg=16.08685874938965
production_forward grad[87] vs paper_forward: mean_abs=0.4332053065299988, max_abs=4.0, mean_rel=0.13161945343017578, max_rel=845.0455932617188, norm_rel=0.021887151524424553, ref_abs_avg=19.92452621459961, test_abs_avg=19.924142837524414
production_forward grad[88] vs paper_forward: mean_abs=0.40167680382728577, max_abs=4.0, mean_rel=0.18090969324111938, max_rel=1281.25, norm_rel=0.02067462168633938, ref_abs_avg=19.628000259399414, test_abs_avg=19.6345272064209
production_forward grad[89] vs paper_forward: mean_abs=0.30411720275878906, max_abs=1.25, mean_rel=0.06484643369913101, max_rel=3.0731563568115234, norm_rel=0.01928488165140152, ref_abs_avg=15.84916877746582, test_abs_avg=15.85599136352539
production_forward grad[90] vs paper_forward: mean_abs=0.401860773563385, max_abs=4.0, mean_rel=0.12633833289146423, max_rel=655.1764526367188, norm_rel=0.021370666101574898, ref_abs_avg=18.983341217041016, test_abs_avg=18.982128143310547
production_forward grad[91] vs paper_forward: mean_abs=0.3722546696662903, max_abs=4.25, mean_rel=0.19745847582817078, max_rel=2093.75, norm_rel=0.01969139836728573, ref_abs_avg=19.128070831298828, test_abs_avg=19.125154495239258
production_forward grad[92] vs paper_forward: mean_abs=0.2981194257736206, max_abs=1.25, mean_rel=0.08256278932094574, max_rel=6.447152137756348, norm_rel=0.01764027774333954, ref_abs_avg=17.263586044311523, test_abs_avg=17.263702392578125
production_forward grad[93] vs paper_forward: mean_abs=0.38473665714263916, max_abs=4.75, mean_rel=0.11906027048826218, max_rel=939.914794921875, norm_rel=0.020790694281458855, ref_abs_avg=18.75077247619629, test_abs_avg=18.749496459960938
production_forward grad[94] vs paper_forward: mean_abs=0.35213586688041687, max_abs=3.25, mean_rel=0.17932522296905518, max_rel=1257.8125, norm_rel=0.01920969970524311, ref_abs_avg=18.536731719970703, test_abs_avg=18.538330078125
production_forward grad[95] vs paper_forward: mean_abs=0.28557920455932617, max_abs=1.03125, mean_rel=0.08527122437953949, max_rel=4.370003700256348, norm_rel=0.018349485471844673, ref_abs_avg=15.63560676574707, test_abs_avg=15.615863800048828
production_forward grad[96] vs paper_forward: mean_abs=0.36697056889533997, max_abs=5.0, mean_rel=0.12523186206817627, max_rel=778.3054809570312, norm_rel=0.020413100719451904, ref_abs_avg=18.308189392089844, test_abs_avg=18.30819320678711
production_forward grad[97] vs paper_forward: mean_abs=0.326904296875, max_abs=3.5, mean_rel=0.15047910809516907, max_rel=781.2499389648438, norm_rel=0.018140144646167755, ref_abs_avg=18.332237243652344, test_abs_avg=18.343576431274414
torch_compile_phases_forward vs paper_forward output: mean_abs=0.0016123754903674126, max_abs=0.0390625
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008311434648931026, max_abs=0.5078125, mean_rel=0.07321518659591675, max_rel=194.01998901367188, norm_rel=0.01999242603778839, ref_abs_avg=0.4473993182182312, test_abs_avg=0.4473975896835327
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.111843109130859, max_abs=56.0, mean_rel=0.18538464605808258, max_rel=541.2750854492188, norm_rel=0.01982272043824196, ref_abs_avg=314.9085693359375, test_abs_avg=314.98046875
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.193886160850525, max_abs=5.0, mean_rel=0.09949721395969391, max_rel=8.673612594604492, norm_rel=0.021962467581033707, ref_abs_avg=53.2126579284668, test_abs_avg=53.07514572143555
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.5605573654174805, max_abs=10.0, mean_rel=0.18717700242996216, max_rel=3221.850341796875, norm_rel=0.02480403520166874, ref_abs_avg=63.250709533691406, test_abs_avg=63.26166534423828
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.4476771354675293, max_abs=9.0, mean_rel=0.3617466688156128, max_rel=5624.99951171875, norm_rel=0.02320680022239685, ref_abs_avg=62.72334289550781, test_abs_avg=62.72529602050781
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0012831687927246, max_abs=4.5, mean_rel=0.08839548379182816, max_rel=8.647263526916504, norm_rel=0.021393755450844765, ref_abs_avg=48.690643310546875, test_abs_avg=48.67057800292969
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.3600518703460693, max_abs=10.0, mean_rel=0.1619967818260193, max_rel=938.235595703125, norm_rel=0.024517761543393135, ref_abs_avg=55.86088562011719, test_abs_avg=55.86560821533203
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.2513171434402466, max_abs=7.5625, mean_rel=0.33832138776779175, max_rel=4625.0, norm_rel=0.022824158892035484, ref_abs_avg=55.14514923095703, test_abs_avg=55.14922332763672
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9416742324829102, max_abs=4.0, mean_rel=0.14616701006889343, max_rel=13.799100875854492, norm_rel=0.02254336141049862, ref_abs_avg=42.50764465332031, test_abs_avg=42.53620910644531
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.251700758934021, max_abs=9.0, mean_rel=0.1668384075164795, max_rel=3871.411865234375, norm_rel=0.024359093979001045, ref_abs_avg=51.68815612792969, test_abs_avg=51.68736267089844
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.1474330425262451, max_abs=7.0, mean_rel=0.3326066732406616, max_rel=3687.499755859375, norm_rel=0.022614099085330963, ref_abs_avg=50.9837646484375, test_abs_avg=50.98277282714844
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.9510295391082764, max_abs=3.5, mean_rel=0.09975288808345795, max_rel=11.504714965820312, norm_rel=0.024014033377170563, ref_abs_avg=39.32476806640625, test_abs_avg=39.287261962890625
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.1436837911605835, max_abs=8.0, mean_rel=0.1520269513130188, max_rel=820.6250610351562, norm_rel=0.023989573121070862, ref_abs_avg=47.99427032470703, test_abs_avg=47.99729537963867
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.0570892095565796, max_abs=6.8125, mean_rel=0.31669455766677856, max_rel=3437.499755859375, norm_rel=0.022340595722198486, ref_abs_avg=47.5930290222168, test_abs_avg=47.59325408935547
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8182077407836914, max_abs=3.0, mean_rel=0.09231340885162354, max_rel=5.56832218170166, norm_rel=0.02200840227305889, ref_abs_avg=38.134708404541016, test_abs_avg=38.23713684082031
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.073957920074463, max_abs=7.0, mean_rel=0.15767601132392883, max_rel=767.3787231445312, norm_rel=0.02394162490963936, ref_abs_avg=45.09787368774414, test_abs_avg=45.098663330078125
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=0.9834885001182556, max_abs=6.03125, mean_rel=0.33017557859420776, max_rel=3187.499755859375, norm_rel=0.022043609991669655, ref_abs_avg=44.8675651550293, test_abs_avg=44.87086486816406
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8327345848083496, max_abs=3.0, mean_rel=0.12622295320034027, max_rel=11.44918441772461, norm_rel=0.024446547031402588, ref_abs_avg=32.931854248046875, test_abs_avg=32.97283172607422
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.014225959777832, max_abs=7.0, mean_rel=0.15137308835983276, max_rel=2486.29345703125, norm_rel=0.0237866397947073, ref_abs_avg=42.893348693847656, test_abs_avg=42.89383316040039
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9296811819076538, max_abs=5.25, mean_rel=0.30623045563697815, max_rel=2718.749755859375, norm_rel=0.022125286981463432, ref_abs_avg=42.25879669189453, test_abs_avg=42.26495361328125
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7179808020591736, max_abs=2.75, mean_rel=0.11587724089622498, max_rel=12.648809432983398, norm_rel=0.020866572856903076, ref_abs_avg=34.0833854675293, test_abs_avg=34.04962158203125
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=0.959053635597229, max_abs=6.5, mean_rel=0.15830200910568237, max_rel=2331.243408203125, norm_rel=0.023662570863962173, ref_abs_avg=40.76611328125, test_abs_avg=40.769439697265625
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.8837577700614929, max_abs=5.5, mean_rel=0.2968333661556244, max_rel=3062.499755859375, norm_rel=0.022095099091529846, ref_abs_avg=40.17995834350586, test_abs_avg=40.177940368652344
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.6992253065109253, max_abs=2.828125, mean_rel=0.10891726613044739, max_rel=6.167509078979492, norm_rel=0.02309422194957733, ref_abs_avg=30.373912811279297, test_abs_avg=30.396093368530273
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9092274904251099, max_abs=7.0, mean_rel=0.14098048210144043, max_rel=654.3975830078125, norm_rel=0.02351774089038372, ref_abs_avg=38.88275909423828, test_abs_avg=38.884620666503906
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8390332460403442, max_abs=5.25, mean_rel=0.2833828628063202, max_rel=2125.0, norm_rel=0.021854523569345474, ref_abs_avg=38.57349395751953, test_abs_avg=38.572418212890625
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.7912511825561523, max_abs=3.1875, mean_rel=0.12602688372135162, max_rel=19.617639541625977, norm_rel=0.023605329915881157, ref_abs_avg=34.018409729003906, test_abs_avg=34.106773376464844
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.0500563383102417, max_abs=6.5, mean_rel=0.16471844911575317, max_rel=2581.544921875, norm_rel=0.02528665028512478, ref_abs_avg=41.744056701660156, test_abs_avg=41.7462272644043
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=0.9711034297943115, max_abs=6.5, mean_rel=0.33877065777778625, max_rel=2437.5, norm_rel=0.023938745260238647, ref_abs_avg=40.784915924072266, test_abs_avg=40.787696838378906
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.7105398178100586, max_abs=2.375, mean_rel=0.11404092609882355, max_rel=21.481250762939453, norm_rel=0.022987807169556618, ref_abs_avg=30.983116149902344, test_abs_avg=30.984188079833984
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=0.9708061814308167, max_abs=6.0, mean_rel=0.1637376844882965, max_rel=656.9788818359375, norm_rel=0.02567048743367195, ref_abs_avg=38.00835418701172, test_abs_avg=38.00938415527344
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9057822227478027, max_abs=5.75, mean_rel=0.309795081615448, max_rel=3718.749755859375, norm_rel=0.02422650158405304, ref_abs_avg=37.475345611572266, test_abs_avg=37.473304748535156
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7073187828063965, max_abs=3.0625, mean_rel=0.10185528546571732, max_rel=6.359087944030762, norm_rel=0.024185145273804665, ref_abs_avg=29.158878326416016, test_abs_avg=29.208267211914062
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9086360931396484, max_abs=9.0, mean_rel=0.16409139335155487, max_rel=2207.732177734375, norm_rel=0.025564786046743393, ref_abs_avg=35.724815368652344, test_abs_avg=35.72662353515625
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.8505715131759644, max_abs=5.25, mean_rel=0.3054463863372803, max_rel=2500.0, norm_rel=0.0242178812623024, ref_abs_avg=35.24349594116211, test_abs_avg=35.24897003173828
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.6670458316802979, max_abs=2.3125, mean_rel=0.3502911925315857, max_rel=88.16226959228516, norm_rel=0.025931749492883682, ref_abs_avg=25.96175765991211, test_abs_avg=25.944576263427734
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8533786535263062, max_abs=6.0, mean_rel=0.16447719931602478, max_rel=1229.3287353515625, norm_rel=0.025354672223329544, ref_abs_avg=33.78959655761719, test_abs_avg=33.793495178222656
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.7937626242637634, max_abs=4.75, mean_rel=0.26855897903442383, max_rel=3156.249755859375, norm_rel=0.023788217455148697, ref_abs_avg=33.39024353027344, test_abs_avg=33.39171600341797
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.6200828552246094, max_abs=2.75, mean_rel=0.891637921333313, max_rel=275.33917236328125, norm_rel=0.02362828701734543, ref_abs_avg=26.99566078186035, test_abs_avg=27.020898818969727
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.804487943649292, max_abs=6.0, mean_rel=0.17186133563518524, max_rel=1649.86279296875, norm_rel=0.024861562997102737, ref_abs_avg=32.49394607543945, test_abs_avg=32.49263000488281
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.745781421661377, max_abs=5.0, mean_rel=0.2495238035917282, max_rel=2937.499755859375, norm_rel=0.02361064963042736, ref_abs_avg=31.66717529296875, test_abs_avg=31.661556243896484
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.5970935821533203, max_abs=2.125, mean_rel=0.12133460491895676, max_rel=13.544770240783691, norm_rel=0.022762127220630646, ref_abs_avg=25.971385955810547, test_abs_avg=25.966045379638672
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.756209135055542, max_abs=6.0, mean_rel=0.15666717290878296, max_rel=1522.138916015625, norm_rel=0.024710776284337044, ref_abs_avg=30.73038101196289, test_abs_avg=30.73160171508789
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7127317190170288, max_abs=4.5, mean_rel=0.2834676504135132, max_rel=2187.5, norm_rel=0.02343444526195526, ref_abs_avg=30.47168731689453, test_abs_avg=30.479183197021484
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5614283680915833, max_abs=2.625, mean_rel=0.11280694603919983, max_rel=11.350821495056152, norm_rel=0.023840077221393585, ref_abs_avg=24.20584487915039, test_abs_avg=24.207664489746094
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7251889705657959, max_abs=5.0, mean_rel=0.15247012674808502, max_rel=1298.655517578125, norm_rel=0.02456042356789112, ref_abs_avg=29.643056869506836, test_abs_avg=29.645240783691406
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.6725168824195862, max_abs=4.625, mean_rel=0.2705259919166565, max_rel=1828.1248779296875, norm_rel=0.022997785359621048, ref_abs_avg=29.314102172851562, test_abs_avg=29.31157684326172
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.5387649536132812, max_abs=2.0, mean_rel=0.07127068936824799, max_rel=2.526197671890259, norm_rel=0.02348485216498375, ref_abs_avg=23.237648010253906, test_abs_avg=23.220115661621094
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.6960878372192383, max_abs=6.0, mean_rel=0.15249906480312347, max_rel=1092.6019287109375, norm_rel=0.02431890368461609, ref_abs_avg=28.746768951416016, test_abs_avg=28.748260498046875
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6452208757400513, max_abs=4.25, mean_rel=0.253635436296463, max_rel=2343.75, norm_rel=0.0226011760532856, ref_abs_avg=28.5910587310791, test_abs_avg=28.590946197509766
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.5272649526596069, max_abs=2.46875, mean_rel=0.23583538830280304, max_rel=76.38228607177734, norm_rel=0.024201931431889534, ref_abs_avg=22.66551399230957, test_abs_avg=22.610702514648438
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.7592200040817261, max_abs=6.25, mean_rel=0.16493189334869385, max_rel=1312.6639404296875, norm_rel=0.025684181600809097, ref_abs_avg=29.636146545410156, test_abs_avg=29.636333465576172
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7064306139945984, max_abs=4.5078125, mean_rel=0.25409311056137085, max_rel=1812.4998779296875, norm_rel=0.024218525737524033, ref_abs_avg=29.26289176940918, test_abs_avg=29.268117904663086
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5516791343688965, max_abs=2.0, mean_rel=0.12988749146461487, max_rel=18.547880172729492, norm_rel=0.024389158934354782, ref_abs_avg=22.219139099121094, test_abs_avg=22.244983673095703
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.6976712942123413, max_abs=4.5, mean_rel=0.1687430441379547, max_rel=1220.34375, norm_rel=0.025354381650686264, ref_abs_avg=27.593563079833984, test_abs_avg=27.592620849609375
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.6521275043487549, max_abs=5.5, mean_rel=0.2947330176830292, max_rel=2937.499755859375, norm_rel=0.024052642285823822, ref_abs_avg=27.2285213470459, test_abs_avg=27.228206634521484
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.49982500076293945, max_abs=2.25, mean_rel=0.12273307144641876, max_rel=13.21939468383789, norm_rel=0.022722309455275536, ref_abs_avg=21.958545684814453, test_abs_avg=21.93349838256836
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.6550264358520508, max_abs=4.5, mean_rel=0.16991592943668365, max_rel=1419.3101806640625, norm_rel=0.024870485067367554, ref_abs_avg=26.36745262145996, test_abs_avg=26.36757469177246
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6105122566223145, max_abs=4.125, mean_rel=0.2625335454940796, max_rel=2375.0, norm_rel=0.023427875712513924, ref_abs_avg=26.071117401123047, test_abs_avg=26.074922561645508
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.4869852066040039, max_abs=2.125, mean_rel=0.07428585737943649, max_rel=4.258906841278076, norm_rel=0.024665242061018944, ref_abs_avg=20.362323760986328, test_abs_avg=20.371322631835938
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6166907548904419, max_abs=5.0, mean_rel=0.16341885924339294, max_rel=1320.2938232421875, norm_rel=0.024512778967618942, ref_abs_avg=25.203224182128906, test_abs_avg=25.203441619873047
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.5729906558990479, max_abs=4.0, mean_rel=0.25184401869773865, max_rel=1914.0623779296875, norm_rel=0.022638969123363495, ref_abs_avg=25.356918334960938, test_abs_avg=25.355243682861328
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.43343162536621094, max_abs=1.734375, mean_rel=0.10093633830547333, max_rel=23.235986709594727, norm_rel=0.021961815655231476, ref_abs_avg=20.063735961914062, test_abs_avg=20.066640853881836
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.5915123224258423, max_abs=5.0, mean_rel=0.15660974383354187, max_rel=1147.9854736328125, norm_rel=0.023814033716917038, ref_abs_avg=24.836318969726562, test_abs_avg=24.833454132080078
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5462262630462646, max_abs=3.96875, mean_rel=0.2176961451768875, max_rel=1812.4998779296875, norm_rel=0.022582780569791794, ref_abs_avg=24.186899185180664, test_abs_avg=24.18578338623047
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.40244966745376587, max_abs=1.5, mean_rel=0.3204821050167084, max_rel=120.81796264648438, norm_rel=0.020319927483797073, ref_abs_avg=20.521976470947266, test_abs_avg=20.460773468017578
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5591413974761963, max_abs=4.0, mean_rel=0.1504502296447754, max_rel=1058.8231201171875, norm_rel=0.02372111938893795, ref_abs_avg=23.60074234008789, test_abs_avg=23.600833892822266
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5216182470321655, max_abs=3.75, mean_rel=0.2179446816444397, max_rel=1437.4998779296875, norm_rel=0.022319938987493515, ref_abs_avg=23.368793487548828, test_abs_avg=23.367835998535156
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.40605998039245605, max_abs=1.625, mean_rel=0.18714094161987305, max_rel=30.583206176757812, norm_rel=0.021345533430576324, ref_abs_avg=19.58149528503418, test_abs_avg=19.5546932220459
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5303521156311035, max_abs=4.0, mean_rel=0.14657938480377197, max_rel=1043.1165771484375, norm_rel=0.023391174152493477, ref_abs_avg=22.731292724609375, test_abs_avg=22.7310791015625
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.49376121163368225, max_abs=3.75, mean_rel=0.22595937550067902, max_rel=2093.75, norm_rel=0.021878760308027267, ref_abs_avg=22.597606658935547, test_abs_avg=22.598060607910156
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.3945188522338867, max_abs=1.75, mean_rel=0.12123395502567291, max_rel=11.854658126831055, norm_rel=0.02222112938761711, ref_abs_avg=18.31340217590332, test_abs_avg=18.339645385742188
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5105087757110596, max_abs=4.5, mean_rel=0.14786161482334137, max_rel=1141.2269287109375, norm_rel=0.023003241047263145, ref_abs_avg=22.219133377075195, test_abs_avg=22.219057083129883
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4737522602081299, max_abs=3.5, mean_rel=0.2210141122341156, max_rel=1281.25, norm_rel=0.0216312687844038, ref_abs_avg=21.963817596435547, test_abs_avg=21.975305557250977
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4675321578979492, max_abs=1.75, mean_rel=0.15925094485282898, max_rel=20.892065048217773, norm_rel=0.02374589629471302, ref_abs_avg=19.466127395629883, test_abs_avg=19.478740692138672
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5826588869094849, max_abs=5.0, mean_rel=0.15505239367485046, max_rel=764.6347045898438, norm_rel=0.024211235344409943, ref_abs_avg=24.101314544677734, test_abs_avg=24.100032806396484
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5413832068443298, max_abs=4.0, mean_rel=0.2358206808567047, max_rel=1296.8748779296875, norm_rel=0.023068206384778023, ref_abs_avg=23.581501007080078, test_abs_avg=23.585628509521484
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4268937110900879, max_abs=1.9375, mean_rel=0.11658492684364319, max_rel=10.886878967285156, norm_rel=0.023893510922789574, ref_abs_avg=17.751361846923828, test_abs_avg=17.782520294189453
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5316554307937622, max_abs=4.5, mean_rel=0.15185758471488953, max_rel=762.1256713867188, norm_rel=0.02371290512382984, ref_abs_avg=22.486791610717773, test_abs_avg=22.487438201904297
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.49586352705955505, max_abs=4.3125, mean_rel=0.2283894121646881, max_rel=1328.1248779296875, norm_rel=0.022242942824959755, ref_abs_avg=22.341251373291016, test_abs_avg=22.33757781982422
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.3768356442451477, max_abs=1.75, mean_rel=0.5239788293838501, max_rel=219.5283660888672, norm_rel=0.021644996479153633, ref_abs_avg=17.880924224853516, test_abs_avg=17.90464973449707
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.49395036697387695, max_abs=4.5, mean_rel=0.14924654364585876, max_rel=687.5926513671875, norm_rel=0.023063747212290764, ref_abs_avg=21.472576141357422, test_abs_avg=21.47119903564453
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.44704657793045044, max_abs=3.5, mean_rel=0.1817607879638672, max_rel=1312.4998779296875, norm_rel=0.0212475024163723, ref_abs_avg=21.047569274902344, test_abs_avg=21.043067932128906
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.34723639488220215, max_abs=1.4375, mean_rel=0.08204711228609085, max_rel=2.8829782009124756, norm_rel=0.022564327344298363, ref_abs_avg=15.498505592346191, test_abs_avg=15.46606159210205
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.4593695402145386, max_abs=4.5, mean_rel=0.13922317326068878, max_rel=971.9857177734375, norm_rel=0.022441063076257706, ref_abs_avg=20.542238235473633, test_abs_avg=20.543014526367188
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.4172334671020508, max_abs=3.5, mean_rel=0.17876440286636353, max_rel=1617.1873779296875, norm_rel=0.020596114918589592, ref_abs_avg=20.267139434814453, test_abs_avg=20.261398315429688
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3438842296600342, max_abs=1.375, mean_rel=0.1899777352809906, max_rel=30.334579467773438, norm_rel=0.021116919815540314, ref_abs_avg=16.077983856201172, test_abs_avg=16.06802749633789
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.43546414375305176, max_abs=4.5, mean_rel=0.13221897184848785, max_rel=851.8070068359375, norm_rel=0.021970931440591812, ref_abs_avg=19.92452621459961, test_abs_avg=19.92400360107422
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.4047945737838745, max_abs=4.0, mean_rel=0.19002476334571838, max_rel=1390.6248779296875, norm_rel=0.02084830030798912, ref_abs_avg=19.628000259399414, test_abs_avg=19.63589096069336
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.31281769275665283, max_abs=1.25, mean_rel=0.07688146829605103, max_rel=12.817310333251953, norm_rel=0.019825465977191925, ref_abs_avg=15.84916877746582, test_abs_avg=15.860191345214844
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.4038129448890686, max_abs=4.0, mean_rel=0.1292780041694641, max_rel=912.3463134765625, norm_rel=0.021452127024531364, ref_abs_avg=18.983341217041016, test_abs_avg=18.98116111755371
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.3762454688549042, max_abs=4.25, mean_rel=0.18253766000270844, max_rel=1640.6248779296875, norm_rel=0.019866714254021645, ref_abs_avg=19.128070831298828, test_abs_avg=19.125904083251953
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.3120384216308594, max_abs=1.125, mean_rel=0.0849735364317894, max_rel=5.974120140075684, norm_rel=0.018229136243462563, ref_abs_avg=17.263586044311523, test_abs_avg=17.26346778869629
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.38581955432891846, max_abs=4.0, mean_rel=0.11991152167320251, max_rel=837.9898681640625, norm_rel=0.020835384726524353, ref_abs_avg=18.75077247619629, test_abs_avg=18.749141693115234
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3534376621246338, max_abs=3.25, mean_rel=0.17597198486328125, max_rel=1171.875, norm_rel=0.01930992491543293, ref_abs_avg=18.536731719970703, test_abs_avg=18.537199020385742
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.28865718841552734, max_abs=1.0, mean_rel=0.1143401563167572, max_rel=10.306050300598145, norm_rel=0.018635593354701996, ref_abs_avg=15.63560676574707, test_abs_avg=15.614550590515137
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.3678312599658966, max_abs=4.5, mean_rel=0.12065628170967102, max_rel=632.0547485351562, norm_rel=0.020456518977880478, ref_abs_avg=18.308189392089844, test_abs_avg=18.30759048461914
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.33150431513786316, max_abs=3.625, mean_rel=0.1474781185388565, max_rel=718.7499389648438, norm_rel=0.018454352393746376, ref_abs_avg=18.332237243652344, test_abs_avg=18.348270416259766
production_forward2 vs paper_forward output: mean_abs=0.0016092355363070965, max_abs=0.046875
production_forward2 grad[0] vs paper_forward: mean_abs=0.007971163839101791, max_abs=0.4765625, mean_rel=0.07055492699146271, max_rel=192.57015991210938, norm_rel=0.019276561215519905, ref_abs_avg=0.4473993182182312, test_abs_avg=0.44740986824035645
production_forward2 grad[1] vs paper_forward: mean_abs=6.960133075714111, max_abs=56.0, mean_rel=0.12645037472248077, max_rel=102.41001892089844, norm_rel=0.019364606589078903, ref_abs_avg=314.9085693359375, test_abs_avg=314.9899597167969
production_forward2 grad[2] vs paper_forward: mean_abs=1.1915178298950195, max_abs=4.5, mean_rel=0.11964760720729828, max_rel=13.830421447753906, norm_rel=0.02157379873096943, ref_abs_avg=53.2126579284668, test_abs_avg=53.16304016113281
production_forward2 grad[3] vs paper_forward: mean_abs=1.51218843460083, max_abs=10.0, mean_rel=0.17015504837036133, max_rel=1806.992431640625, norm_rel=0.024055011570453644, ref_abs_avg=63.250709533691406, test_abs_avg=63.261314392089844
production_forward2 grad[4] vs paper_forward: mean_abs=1.3958995342254639, max_abs=9.875, mean_rel=0.4167911410331726, max_rel=4406.25, norm_rel=0.02239380218088627, ref_abs_avg=62.72334289550781, test_abs_avg=62.72997283935547
production_forward2 grad[5] vs paper_forward: mean_abs=0.9609355926513672, max_abs=3.75, mean_rel=0.07887589931488037, max_rel=8.042097091674805, norm_rel=0.020031094551086426, ref_abs_avg=48.690643310546875, test_abs_avg=48.67787551879883
production_forward2 grad[6] vs paper_forward: mean_abs=1.318871259689331, max_abs=9.0, mean_rel=0.15899109840393066, max_rel=1371.826904296875, norm_rel=0.02378724329173565, ref_abs_avg=55.86088562011719, test_abs_avg=55.868202209472656
production_forward2 grad[7] vs paper_forward: mean_abs=1.2084522247314453, max_abs=8.25, mean_rel=0.30919480323791504, max_rel=4093.749755859375, norm_rel=0.022050509229302406, ref_abs_avg=55.14514923095703, test_abs_avg=55.151893615722656
production_forward2 grad[8] vs paper_forward: mean_abs=0.9695885181427002, max_abs=4.0, mean_rel=0.1448792815208435, max_rel=15.552237510681152, norm_rel=0.022698380053043365, ref_abs_avg=42.50764465332031, test_abs_avg=42.562870025634766
production_forward2 grad[9] vs paper_forward: mean_abs=1.2149603366851807, max_abs=8.0, mean_rel=0.16080713272094727, max_rel=2889.55810546875, norm_rel=0.02367720752954483, ref_abs_avg=51.68815612792969, test_abs_avg=51.68901824951172
production_forward2 grad[10] vs paper_forward: mean_abs=1.1102222204208374, max_abs=6.875, mean_rel=0.3073762059211731, max_rel=2687.499755859375, norm_rel=0.02188843861222267, ref_abs_avg=50.9837646484375, test_abs_avg=50.98223114013672
production_forward2 grad[11] vs paper_forward: mean_abs=0.9114522933959961, max_abs=3.75, mean_rel=0.0890994668006897, max_rel=5.128869533538818, norm_rel=0.02317184954881668, ref_abs_avg=39.32476806640625, test_abs_avg=39.315452575683594
production_forward2 grad[12] vs paper_forward: mean_abs=1.1134717464447021, max_abs=7.0, mean_rel=0.15187667310237885, max_rel=1469.2237548828125, norm_rel=0.023359347134828568, ref_abs_avg=47.99427032470703, test_abs_avg=47.998592376708984
production_forward2 grad[13] vs paper_forward: mean_abs=1.023221731185913, max_abs=5.875, mean_rel=0.3197785019874573, max_rel=2562.5, norm_rel=0.021654712036252022, ref_abs_avg=47.5930290222168, test_abs_avg=47.595924377441406
production_forward2 grad[14] vs paper_forward: mean_abs=0.8261470794677734, max_abs=2.96875, mean_rel=0.0787573754787445, max_rel=2.9278037548065186, norm_rel=0.021891098469495773, ref_abs_avg=38.134708404541016, test_abs_avg=38.18557357788086
production_forward2 grad[15] vs paper_forward: mean_abs=1.0434328317642212, max_abs=7.0, mean_rel=0.15630190074443817, max_rel=1112.3756103515625, norm_rel=0.023293767124414444, ref_abs_avg=45.09787368774414, test_abs_avg=45.101531982421875
production_forward2 grad[16] vs paper_forward: mean_abs=0.9549486041069031, max_abs=5.875, mean_rel=0.3113662898540497, max_rel=3374.999755859375, norm_rel=0.02139405533671379, ref_abs_avg=44.8675651550293, test_abs_avg=44.87144470214844
production_forward2 grad[17] vs paper_forward: mean_abs=0.8066422939300537, max_abs=3.0, mean_rel=0.12833641469478607, max_rel=6.352704048156738, norm_rel=0.023813797160983086, ref_abs_avg=32.931854248046875, test_abs_avg=32.95733642578125
production_forward2 grad[18] vs paper_forward: mean_abs=0.9884559512138367, max_abs=7.5, mean_rel=0.14426550269126892, max_rel=1344.5523681640625, norm_rel=0.023196537047624588, ref_abs_avg=42.893348693847656, test_abs_avg=42.89463806152344
production_forward2 grad[19] vs paper_forward: mean_abs=0.9055849313735962, max_abs=5.25, mean_rel=0.33344003558158875, max_rel=3187.499755859375, norm_rel=0.021546470001339912, ref_abs_avg=42.25879669189453, test_abs_avg=42.263999938964844
production_forward2 grad[20] vs paper_forward: mean_abs=0.7079133987426758, max_abs=2.75, mean_rel=0.10764683783054352, max_rel=12.152777671813965, norm_rel=0.02100117690861225, ref_abs_avg=34.0833854675293, test_abs_avg=34.044830322265625
production_forward2 grad[21] vs paper_forward: mean_abs=0.9357087016105652, max_abs=6.5, mean_rel=0.15282970666885376, max_rel=1939.208251953125, norm_rel=0.023107275366783142, ref_abs_avg=40.76611328125, test_abs_avg=40.770301818847656
production_forward2 grad[22] vs paper_forward: mean_abs=0.8596898317337036, max_abs=5.5, mean_rel=0.2913442850112915, max_rel=2562.5, norm_rel=0.0215216726064682, ref_abs_avg=40.17995834350586, test_abs_avg=40.17676544189453
production_forward2 grad[23] vs paper_forward: mean_abs=0.677777886390686, max_abs=2.78125, mean_rel=0.17813047766685486, max_rel=44.05498504638672, norm_rel=0.02269512601196766, ref_abs_avg=30.373912811279297, test_abs_avg=30.40947151184082
production_forward2 grad[24] vs paper_forward: mean_abs=0.8885911703109741, max_abs=6.0, mean_rel=0.136476069688797, max_rel=1336.8367919921875, norm_rel=0.022996414452791214, ref_abs_avg=38.88275909423828, test_abs_avg=38.8861083984375
production_forward2 grad[25] vs paper_forward: mean_abs=0.8176523447036743, max_abs=5.25, mean_rel=0.2549630403518677, max_rel=2624.999755859375, norm_rel=0.021299609914422035, ref_abs_avg=38.57349395751953, test_abs_avg=38.57208251953125
production_forward2 grad[26] vs paper_forward: mean_abs=0.7982873916625977, max_abs=3.6875, mean_rel=0.12357820570468903, max_rel=13.689048767089844, norm_rel=0.023302968591451645, ref_abs_avg=34.018409729003906, test_abs_avg=34.10856628417969
production_forward2 grad[27] vs paper_forward: mean_abs=1.0265858173370361, max_abs=8.0, mean_rel=0.16137190163135529, max_rel=2671.455810546875, norm_rel=0.02472255565226078, ref_abs_avg=41.744056701660156, test_abs_avg=41.74654769897461
production_forward2 grad[28] vs paper_forward: mean_abs=0.9495595693588257, max_abs=6.0, mean_rel=0.31867605447769165, max_rel=2812.499755859375, norm_rel=0.023409822955727577, ref_abs_avg=40.784915924072266, test_abs_avg=40.78733825683594
production_forward2 grad[29] vs paper_forward: mean_abs=0.6654243469238281, max_abs=2.625, mean_rel=0.10148084908723831, max_rel=17.097623825073242, norm_rel=0.02201872132718563, ref_abs_avg=30.983116149902344, test_abs_avg=30.967876434326172
production_forward2 grad[30] vs paper_forward: mean_abs=0.9501780271530151, max_abs=6.0, mean_rel=0.15970909595489502, max_rel=1020.16845703125, norm_rel=0.025140181183815002, ref_abs_avg=38.00835418701172, test_abs_avg=38.010826110839844
production_forward2 grad[31] vs paper_forward: mean_abs=0.8834102749824524, max_abs=5.21875, mean_rel=0.3154401183128357, max_rel=2593.749755859375, norm_rel=0.023636143654584885, ref_abs_avg=37.475345611572266, test_abs_avg=37.47502136230469
production_forward2 grad[32] vs paper_forward: mean_abs=0.6873229146003723, max_abs=2.5, mean_rel=0.0921289324760437, max_rel=4.288970470428467, norm_rel=0.023398593068122864, ref_abs_avg=29.158878326416016, test_abs_avg=29.140731811523438
production_forward2 grad[33] vs paper_forward: mean_abs=0.8910493850708008, max_abs=7.0, mean_rel=0.16208377480506897, max_rel=2007.0423583984375, norm_rel=0.025063328444957733, ref_abs_avg=35.724815368652344, test_abs_avg=35.72737503051758
production_forward2 grad[34] vs paper_forward: mean_abs=0.8299530744552612, max_abs=5.125, mean_rel=0.30357372760772705, max_rel=3062.499755859375, norm_rel=0.02363666333258152, ref_abs_avg=35.24349594116211, test_abs_avg=35.25129699707031
production_forward2 grad[35] vs paper_forward: mean_abs=0.6599171161651611, max_abs=2.5, mean_rel=0.2849936783313751, max_rel=63.034088134765625, norm_rel=0.02570251002907753, ref_abs_avg=25.96175765991211, test_abs_avg=25.953571319580078
production_forward2 grad[36] vs paper_forward: mean_abs=0.8390803933143616, max_abs=5.5, mean_rel=0.16254594922065735, max_rel=905.0597534179688, norm_rel=0.024939745664596558, ref_abs_avg=33.78959655761719, test_abs_avg=33.793556213378906
production_forward2 grad[37] vs paper_forward: mean_abs=0.7751044034957886, max_abs=5.5, mean_rel=0.264453649520874, max_rel=2656.249755859375, norm_rel=0.02326277457177639, ref_abs_avg=33.39024353027344, test_abs_avg=33.393978118896484
production_forward2 grad[38] vs paper_forward: mean_abs=0.6250369548797607, max_abs=2.5, mean_rel=0.33295297622680664, max_rel=104.87358093261719, norm_rel=0.02371005155146122, ref_abs_avg=26.99566078186035, test_abs_avg=27.026615142822266
production_forward2 grad[39] vs paper_forward: mean_abs=0.7915717363357544, max_abs=6.0, mean_rel=0.17152482271194458, max_rel=1809.5540771484375, norm_rel=0.02446706034243107, ref_abs_avg=32.49394607543945, test_abs_avg=32.493194580078125
production_forward2 grad[40] vs paper_forward: mean_abs=0.7327830791473389, max_abs=4.6875, mean_rel=0.23540902137756348, max_rel=2125.0, norm_rel=0.0232014711946249, ref_abs_avg=31.66717529296875, test_abs_avg=31.663177490234375
production_forward2 grad[41] vs paper_forward: mean_abs=0.5815401077270508, max_abs=2.125, mean_rel=0.09836018085479736, max_rel=8.213109016418457, norm_rel=0.02260088548064232, ref_abs_avg=25.971385955810547, test_abs_avg=26.005319595336914
production_forward2 grad[42] vs paper_forward: mean_abs=0.7451515197753906, max_abs=5.0, mean_rel=0.15504783391952515, max_rel=1454.02783203125, norm_rel=0.02435941994190216, ref_abs_avg=30.73038101196289, test_abs_avg=30.73143196105957
production_forward2 grad[43] vs paper_forward: mean_abs=0.6994441747665405, max_abs=4.25, mean_rel=0.26694920659065247, max_rel=2312.5, norm_rel=0.0230089258402586, ref_abs_avg=30.47168731689453, test_abs_avg=30.47878646850586
production_forward2 grad[44] vs paper_forward: mean_abs=0.5448967814445496, max_abs=3.25, mean_rel=0.12054488807916641, max_rel=13.119872093200684, norm_rel=0.02319372072815895, ref_abs_avg=24.20584487915039, test_abs_avg=24.188495635986328
production_forward2 grad[45] vs paper_forward: mean_abs=0.7148041725158691, max_abs=5.0, mean_rel=0.1530895084142685, max_rel=1836.5533447265625, norm_rel=0.02422626130282879, ref_abs_avg=29.643056869506836, test_abs_avg=29.645980834960938
production_forward2 grad[46] vs paper_forward: mean_abs=0.664910078048706, max_abs=4.3125, mean_rel=0.25530487298965454, max_rel=1734.3748779296875, norm_rel=0.022769156843423843, ref_abs_avg=29.314102172851562, test_abs_avg=29.31247329711914
production_forward2 grad[47] vs paper_forward: mean_abs=0.5133028030395508, max_abs=2.0, mean_rel=0.06880110502243042, max_rel=2.620873212814331, norm_rel=0.022559380158782005, ref_abs_avg=23.237648010253906, test_abs_avg=23.219966888427734
production_forward2 grad[48] vs paper_forward: mean_abs=0.6882851123809814, max_abs=5.0, mean_rel=0.15166646242141724, max_rel=860.6349487304688, norm_rel=0.02403740957379341, ref_abs_avg=28.746768951416016, test_abs_avg=28.7486572265625
production_forward2 grad[49] vs paper_forward: mean_abs=0.6369038820266724, max_abs=4.25, mean_rel=0.2322026789188385, max_rel=1953.1248779296875, norm_rel=0.022314956411719322, ref_abs_avg=28.5910587310791, test_abs_avg=28.590362548828125
production_forward2 grad[50] vs paper_forward: mean_abs=0.5485097169876099, max_abs=2.15625, mean_rel=0.13942626118659973, max_rel=30.397497177124023, norm_rel=0.024841150268912315, ref_abs_avg=22.66551399230957, test_abs_avg=22.635353088378906
production_forward2 grad[51] vs paper_forward: mean_abs=0.7466393709182739, max_abs=5.75, mean_rel=0.16022330522537231, max_rel=956.3894653320312, norm_rel=0.02527124434709549, ref_abs_avg=29.636146545410156, test_abs_avg=29.63681411743164
production_forward2 grad[52] vs paper_forward: mean_abs=0.692028284072876, max_abs=4.5, mean_rel=0.24968963861465454, max_rel=1999.9998779296875, norm_rel=0.023711657151579857, ref_abs_avg=29.26289176940918, test_abs_avg=29.265090942382812
production_forward2 grad[53] vs paper_forward: mean_abs=0.5419726371765137, max_abs=2.0, mean_rel=0.14410850405693054, max_rel=19.96592903137207, norm_rel=0.024095989763736725, ref_abs_avg=22.219139099121094, test_abs_avg=22.277015686035156
production_forward2 grad[54] vs paper_forward: mean_abs=0.6879128217697144, max_abs=4.5, mean_rel=0.17121174931526184, max_rel=1513.9661865234375, norm_rel=0.025002634152770042, ref_abs_avg=27.593563079833984, test_abs_avg=27.59238052368164
production_forward2 grad[55] vs paper_forward: mean_abs=0.6414936780929565, max_abs=4.5, mean_rel=0.29263705015182495, max_rel=2468.75, norm_rel=0.023651454597711563, ref_abs_avg=27.2285213470459, test_abs_avg=27.23308563232422
production_forward2 grad[56] vs paper_forward: mean_abs=0.49132537841796875, max_abs=2.25, mean_rel=0.11297635734081268, max_rel=12.693809509277344, norm_rel=0.022135592997074127, ref_abs_avg=21.958545684814453, test_abs_avg=21.939292907714844
production_forward2 grad[57] vs paper_forward: mean_abs=0.6461131572723389, max_abs=4.5, mean_rel=0.16488324105739594, max_rel=954.01904296875, norm_rel=0.02453557401895523, ref_abs_avg=26.36745262145996, test_abs_avg=26.366912841796875
production_forward2 grad[58] vs paper_forward: mean_abs=0.6004374027252197, max_abs=3.75, mean_rel=0.2736307680606842, max_rel=2656.249755859375, norm_rel=0.02305580861866474, ref_abs_avg=26.071117401123047, test_abs_avg=26.075775146484375
production_forward2 grad[59] vs paper_forward: mean_abs=0.4968385696411133, max_abs=1.78125, mean_rel=0.06810091435909271, max_rel=2.254194736480713, norm_rel=0.024995049461722374, ref_abs_avg=20.362323760986328, test_abs_avg=20.3835391998291
production_forward2 grad[60] vs paper_forward: mean_abs=0.6089609861373901, max_abs=5.0, mean_rel=0.15966203808784485, max_rel=1228.7672119140625, norm_rel=0.02422192320227623, ref_abs_avg=25.203224182128906, test_abs_avg=25.204166412353516
production_forward2 grad[61] vs paper_forward: mean_abs=0.5664370059967041, max_abs=3.5, mean_rel=0.2380768060684204, max_rel=1898.4373779296875, norm_rel=0.022365175187587738, ref_abs_avg=25.356918334960938, test_abs_avg=25.3582763671875
production_forward2 grad[62] vs paper_forward: mean_abs=0.43446874618530273, max_abs=1.75, mean_rel=0.10957174748182297, max_rel=26.669151306152344, norm_rel=0.022054927423596382, ref_abs_avg=20.063735961914062, test_abs_avg=20.06146812438965
production_forward2 grad[63] vs paper_forward: mean_abs=0.5849381685256958, max_abs=5.4375, mean_rel=0.15594696998596191, max_rel=1039.7362060546875, norm_rel=0.023560622707009315, ref_abs_avg=24.836318969726562, test_abs_avg=24.834163665771484
production_forward2 grad[64] vs paper_forward: mean_abs=0.5380040407180786, max_abs=3.84375, mean_rel=0.2162172794342041, max_rel=1999.9998779296875, norm_rel=0.022228822112083435, ref_abs_avg=24.186899185180664, test_abs_avg=24.18622398376465
production_forward2 grad[65] vs paper_forward: mean_abs=0.4121628403663635, max_abs=1.6875, mean_rel=0.32641541957855225, max_rel=118.98787689208984, norm_rel=0.02039211057126522, ref_abs_avg=20.521976470947266, test_abs_avg=20.460386276245117
production_forward2 grad[66] vs paper_forward: mean_abs=0.5541481375694275, max_abs=4.0, mean_rel=0.14688192307949066, max_rel=870.412353515625, norm_rel=0.023525241762399673, ref_abs_avg=23.60074234008789, test_abs_avg=23.601715087890625
production_forward2 grad[67] vs paper_forward: mean_abs=0.5141832232475281, max_abs=3.625, mean_rel=0.2132902294397354, max_rel=1343.7498779296875, norm_rel=0.022003693506121635, ref_abs_avg=23.368793487548828, test_abs_avg=23.36585235595703
production_forward2 grad[68] vs paper_forward: mean_abs=0.3952066898345947, max_abs=1.625, mean_rel=0.14978209137916565, max_rel=22.21766471862793, norm_rel=0.02061288245022297, ref_abs_avg=19.58149528503418, test_abs_avg=19.54160499572754
production_forward2 grad[69] vs paper_forward: mean_abs=0.5256715416908264, max_abs=4.0, mean_rel=0.14640313386917114, max_rel=653.61083984375, norm_rel=0.023196058347821236, ref_abs_avg=22.731292724609375, test_abs_avg=22.73199462890625
production_forward2 grad[70] vs paper_forward: mean_abs=0.4901716113090515, max_abs=4.5, mean_rel=0.233384907245636, max_rel=1874.9998779296875, norm_rel=0.0217482578009367, ref_abs_avg=22.597606658935547, test_abs_avg=22.59881591796875
production_forward2 grad[71] vs paper_forward: mean_abs=0.3918423652648926, max_abs=1.8125, mean_rel=0.13320192694664001, max_rel=16.73977279663086, norm_rel=0.02201785147190094, ref_abs_avg=18.31340217590332, test_abs_avg=18.3372802734375
production_forward2 grad[72] vs paper_forward: mean_abs=0.5071658492088318, max_abs=4.25, mean_rel=0.1493939310312271, max_rel=1389.4390869140625, norm_rel=0.02285759709775448, ref_abs_avg=22.219133377075195, test_abs_avg=22.219589233398438
production_forward2 grad[73] vs paper_forward: mean_abs=0.4720410108566284, max_abs=3.5, mean_rel=0.20424224436283112, max_rel=1125.0, norm_rel=0.02151854708790779, ref_abs_avg=21.963817596435547, test_abs_avg=21.973812103271484
production_forward2 grad[74] vs paper_forward: mean_abs=0.46495532989501953, max_abs=1.75, mean_rel=0.14420993626117706, max_rel=18.000877380371094, norm_rel=0.023633508011698723, ref_abs_avg=19.466127395629883, test_abs_avg=19.475162506103516
production_forward2 grad[75] vs paper_forward: mean_abs=0.5779284834861755, max_abs=5.0, mean_rel=0.15417766571044922, max_rel=651.0836791992188, norm_rel=0.024025367572903633, ref_abs_avg=24.101314544677734, test_abs_avg=24.100753784179688
production_forward2 grad[76] vs paper_forward: mean_abs=0.5356684923171997, max_abs=4.125, mean_rel=0.23901447653770447, max_rel=1437.4998779296875, norm_rel=0.0228184312582016, ref_abs_avg=23.581501007080078, test_abs_avg=23.58252716064453
production_forward2 grad[77] vs paper_forward: mean_abs=0.4239187240600586, max_abs=2.0625, mean_rel=0.10545726120471954, max_rel=9.985274314880371, norm_rel=0.023617740720510483, ref_abs_avg=17.751361846923828, test_abs_avg=17.76169776916504
production_forward2 grad[78] vs paper_forward: mean_abs=0.5261026620864868, max_abs=4.25, mean_rel=0.15195520222187042, max_rel=954.0471801757812, norm_rel=0.023483937606215477, ref_abs_avg=22.486791610717773, test_abs_avg=22.48755645751953
production_forward2 grad[79] vs paper_forward: mean_abs=0.489912748336792, max_abs=4.0, mean_rel=0.21654853224754333, max_rel=1281.25, norm_rel=0.02200867421925068, ref_abs_avg=22.341251373291016, test_abs_avg=22.33966827392578
production_forward2 grad[80] vs paper_forward: mean_abs=0.37285715341567993, max_abs=1.625, mean_rel=0.629192590713501, max_rel=273.3591003417969, norm_rel=0.02124684490263462, ref_abs_avg=17.880924224853516, test_abs_avg=17.891809463500977
production_forward2 grad[81] vs paper_forward: mean_abs=0.4896974563598633, max_abs=5.0, mean_rel=0.14852140843868256, max_rel=999.0321044921875, norm_rel=0.022881077602505684, ref_abs_avg=21.472576141357422, test_abs_avg=21.47152328491211
production_forward2 grad[82] vs paper_forward: mean_abs=0.444197416305542, max_abs=3.875, mean_rel=0.1753196120262146, max_rel=1343.7498779296875, norm_rel=0.021095946431159973, ref_abs_avg=21.047569274902344, test_abs_avg=21.04400634765625
production_forward2 grad[83] vs paper_forward: mean_abs=0.35048341751098633, max_abs=1.375, mean_rel=0.07673291862010956, max_rel=2.2230780124664307, norm_rel=0.02258709818124771, ref_abs_avg=15.498505592346191, test_abs_avg=15.461387634277344
production_forward2 grad[84] vs paper_forward: mean_abs=0.45590996742248535, max_abs=4.0, mean_rel=0.1383208930492401, max_rel=1273.8724365234375, norm_rel=0.022293563932180405, ref_abs_avg=20.542238235473633, test_abs_avg=20.5426025390625
production_forward2 grad[85] vs paper_forward: mean_abs=0.41605907678604126, max_abs=3.75, mean_rel=0.1809394210577011, max_rel=1273.4375, norm_rel=0.02055245079100132, ref_abs_avg=20.267139434814453, test_abs_avg=20.26219940185547
production_forward2 grad[86] vs paper_forward: mean_abs=0.3617713451385498, max_abs=1.25, mean_rel=0.19818046689033508, max_rel=40.98674011230469, norm_rel=0.021856291219592094, ref_abs_avg=16.077983856201172, test_abs_avg=16.08685874938965
production_forward2 grad[87] vs paper_forward: mean_abs=0.4332053065299988, max_abs=4.0, mean_rel=0.13161945343017578, max_rel=845.0455932617188, norm_rel=0.021887151524424553, ref_abs_avg=19.92452621459961, test_abs_avg=19.924142837524414
production_forward2 grad[88] vs paper_forward: mean_abs=0.40167680382728577, max_abs=4.0, mean_rel=0.18090969324111938, max_rel=1281.25, norm_rel=0.02067462168633938, ref_abs_avg=19.628000259399414, test_abs_avg=19.6345272064209
production_forward2 grad[89] vs paper_forward: mean_abs=0.30411720275878906, max_abs=1.25, mean_rel=0.06484643369913101, max_rel=3.0731563568115234, norm_rel=0.01928488165140152, ref_abs_avg=15.84916877746582, test_abs_avg=15.85599136352539
production_forward2 grad[90] vs paper_forward: mean_abs=0.401860773563385, max_abs=4.0, mean_rel=0.12633833289146423, max_rel=655.1764526367188, norm_rel=0.021370666101574898, ref_abs_avg=18.983341217041016, test_abs_avg=18.982128143310547
production_forward2 grad[91] vs paper_forward: mean_abs=0.3722546696662903, max_abs=4.25, mean_rel=0.19745847582817078, max_rel=2093.75, norm_rel=0.01969139836728573, ref_abs_avg=19.128070831298828, test_abs_avg=19.125154495239258
production_forward2 grad[92] vs paper_forward: mean_abs=0.2981194257736206, max_abs=1.25, mean_rel=0.08256278932094574, max_rel=6.447152137756348, norm_rel=0.01764027774333954, ref_abs_avg=17.263586044311523, test_abs_avg=17.263702392578125
production_forward2 grad[93] vs paper_forward: mean_abs=0.38473665714263916, max_abs=4.75, mean_rel=0.11906027048826218, max_rel=939.914794921875, norm_rel=0.020790694281458855, ref_abs_avg=18.75077247619629, test_abs_avg=18.749496459960938
production_forward2 grad[94] vs paper_forward: mean_abs=0.35213586688041687, max_abs=3.25, mean_rel=0.17932522296905518, max_rel=1257.8125, norm_rel=0.01920969970524311, ref_abs_avg=18.536731719970703, test_abs_avg=18.538330078125
production_forward2 grad[95] vs paper_forward: mean_abs=0.28557920455932617, max_abs=1.03125, mean_rel=0.08527122437953949, max_rel=4.370003700256348, norm_rel=0.018349485471844673, ref_abs_avg=15.63560676574707, test_abs_avg=15.615863800048828
production_forward2 grad[96] vs paper_forward: mean_abs=0.36697056889533997, max_abs=5.0, mean_rel=0.12523186206817627, max_rel=778.3054809570312, norm_rel=0.020413100719451904, ref_abs_avg=18.308189392089844, test_abs_avg=18.30819320678711
production_forward2 grad[97] vs paper_forward: mean_abs=0.326904296875, max_abs=3.5, mean_rel=0.15047910809516907, max_rel=781.2499389648438, norm_rel=0.018140144646167755, ref_abs_avg=18.332237243652344, test_abs_avg=18.343576431274414
identity layers + randn queries
paper_forward fwd+bwd:  382.454 ms
paper_forward bwd-only: 302.200 ms
paper_forward peak allocated: fwd=29.706 GiB, fwd+bwd=31.825 GiB
paper_forward peak reserved:  fwd=29.723 GiB, fwd+bwd=32.473 GiB
production_forward fwd+bwd:  114.344 ms
production_forward bwd-only: 96.074 ms
production_forward peak allocated: fwd=2.192 GiB, fwd+bwd=10.196 GiB
production_forward peak reserved:  fwd=2.303 GiB, fwd+bwd=10.303 GiB
torch_compile_phases_forward fwd+bwd:  166.994 ms
torch_compile_phases_forward bwd-only: 132.660 ms
torch_compile_phases_forward peak allocated: fwd=12.781 GiB, fwd+bwd=13.409 GiB
torch_compile_phases_forward peak reserved:  fwd=13.078 GiB, fwd+bwd=17.330 GiB
production_forward2 fwd+bwd:  113.549 ms
production_forward2 bwd-only: 95.986 ms
production_forward2 peak allocated: fwd=3.071 GiB, fwd+bwd=10.196 GiB
production_forward2 peak reserved:  fwd=3.303 GiB, fwd+bwd=11.303 GiB

grads check for swiglu layers + randn queries
production_forward vs paper_forward output: mean_abs=0.0016450458206236362, max_abs=0.046875
production_forward grad[0] vs paper_forward: mean_abs=0.008444223552942276, max_abs=0.359375, mean_rel=0.07291826605796814, max_rel=108.71244812011719, norm_rel=0.019977720454335213, ref_abs_avg=0.45862895250320435, test_abs_avg=0.458646297454834
production_forward grad[1] vs paper_forward: mean_abs=7.226812362670898, max_abs=64.0, mean_rel=0.18118233978748322, max_rel=437.25360107421875, norm_rel=0.019989054650068283, ref_abs_avg=319.58984375, test_abs_avg=319.68511962890625
production_forward grad[2] vs paper_forward: mean_abs=1.2596702575683594, max_abs=5.0, mean_rel=0.10924942046403885, max_rel=10.911587715148926, norm_rel=0.02240041270852089, ref_abs_avg=57.5374755859375, test_abs_avg=57.495216369628906
production_forward grad[3] vs paper_forward: mean_abs=1.6259660720825195, max_abs=10.0, mean_rel=0.19086840748786926, max_rel=2048.8388671875, norm_rel=0.024516236037015915, ref_abs_avg=66.72473907470703, test_abs_avg=66.73091125488281
production_forward grad[4] vs paper_forward: mean_abs=1.4956934452056885, max_abs=10.0, mean_rel=0.4647892713546753, max_rel=5187.49951171875, norm_rel=0.02289823815226555, ref_abs_avg=65.64997863769531, test_abs_avg=65.64208984375
production_forward grad[5] vs paper_forward: mean_abs=1.1276391744613647, max_abs=4.375, mean_rel=0.1355341225862503, max_rel=15.180414199829102, norm_rel=0.02217925153672695, ref_abs_avg=51.40839385986328, test_abs_avg=51.434730529785156
production_forward grad[6] vs paper_forward: mean_abs=1.423633337020874, max_abs=10.0, mean_rel=0.16240772604942322, max_rel=2217.306884765625, norm_rel=0.024230744689702988, ref_abs_avg=59.105438232421875, test_abs_avg=59.11161804199219
production_forward grad[7] vs paper_forward: mean_abs=1.3101398944854736, max_abs=7.75, mean_rel=0.3714177906513214, max_rel=4500.0, norm_rel=0.02262510173022747, ref_abs_avg=58.19281005859375, test_abs_avg=58.196571350097656
production_forward grad[8] vs paper_forward: mean_abs=1.0055952072143555, max_abs=4.0, mean_rel=0.1481872946023941, max_rel=26.975772857666016, norm_rel=0.022552169859409332, ref_abs_avg=45.517913818359375, test_abs_avg=45.54708480834961
production_forward grad[9] vs paper_forward: mean_abs=1.2968623638153076, max_abs=8.0, mean_rel=0.16645529866218567, max_rel=1637.5213623046875, norm_rel=0.024080118164420128, ref_abs_avg=54.241798400878906, test_abs_avg=54.24333190917969
production_forward grad[10] vs paper_forward: mean_abs=1.1908769607543945, max_abs=7.0, mean_rel=0.3844758868217468, max_rel=3624.999755859375, norm_rel=0.02244643121957779, ref_abs_avg=53.372596740722656, test_abs_avg=53.36600112915039
production_forward grad[11] vs paper_forward: mean_abs=0.894012451171875, max_abs=3.75, mean_rel=0.10231302678585052, max_rel=7.727711200714111, norm_rel=0.02253379113972187, ref_abs_avg=39.64055252075195, test_abs_avg=39.665897369384766
production_forward grad[12] vs paper_forward: mean_abs=1.1893693208694458, max_abs=8.0, mean_rel=0.14940118789672852, max_rel=1579.0809326171875, norm_rel=0.023906022310256958, ref_abs_avg=50.06644821166992, test_abs_avg=50.07368469238281
production_forward grad[13] vs paper_forward: mean_abs=1.0898501873016357, max_abs=6.25, mean_rel=0.2754914164543152, max_rel=3374.999755859375, norm_rel=0.022363299503922462, ref_abs_avg=49.023536682128906, test_abs_avg=49.02763366699219
production_forward grad[14] vs paper_forward: mean_abs=0.8395442962646484, max_abs=3.375, mean_rel=0.10179779678583145, max_rel=8.721981048583984, norm_rel=0.02189335972070694, ref_abs_avg=40.03599548339844, test_abs_avg=40.02447509765625
production_forward grad[15] vs paper_forward: mean_abs=1.107560396194458, max_abs=7.4375, mean_rel=0.1745210886001587, max_rel=2642.03515625, norm_rel=0.02385355904698372, ref_abs_avg=46.669776916503906, test_abs_avg=46.67433166503906
production_forward grad[16] vs paper_forward: mean_abs=1.0232365131378174, max_abs=6.0, mean_rel=0.28527897596359253, max_rel=2999.999755859375, norm_rel=0.022283319383859634, ref_abs_avg=46.14007568359375, test_abs_avg=46.14365768432617
production_forward grad[17] vs paper_forward: mean_abs=0.7955845594406128, max_abs=3.0, mean_rel=0.07675979286432266, max_rel=2.3553762435913086, norm_rel=0.021417830139398575, ref_abs_avg=36.9268684387207, test_abs_avg=36.88798522949219
production_forward grad[18] vs paper_forward: mean_abs=1.047884464263916, max_abs=7.5, mean_rel=0.1500939130783081, max_rel=1413.284912109375, norm_rel=0.023753877729177475, ref_abs_avg=44.39404296875, test_abs_avg=44.393898010253906
production_forward grad[19] vs paper_forward: mean_abs=0.9591789245605469, max_abs=6.0, mean_rel=0.2459820806980133, max_rel=2656.249755859375, norm_rel=0.022092368453741074, ref_abs_avg=43.630287170410156, test_abs_avg=43.629825592041016
production_forward grad[20] vs paper_forward: mean_abs=0.7607011795043945, max_abs=3.125, mean_rel=0.08494707942008972, max_rel=3.55193829536438, norm_rel=0.021660180762410164, ref_abs_avg=36.0401496887207, test_abs_avg=35.99089813232422
production_forward grad[21] vs paper_forward: mean_abs=0.991024374961853, max_abs=8.0, mean_rel=0.15867391228675842, max_rel=1105.81884765625, norm_rel=0.02361636608839035, ref_abs_avg=42.223487854003906, test_abs_avg=42.22532653808594
production_forward grad[22] vs paper_forward: mean_abs=0.9080257415771484, max_abs=5.125, mean_rel=0.3100520372390747, max_rel=2656.249755859375, norm_rel=0.02185431309044361, ref_abs_avg=41.80931854248047, test_abs_avg=41.81483459472656
production_forward grad[23] vs paper_forward: mean_abs=0.7391000986099243, max_abs=3.0, mean_rel=0.08922338485717773, max_rel=10.747628211975098, norm_rel=0.0229354128241539, ref_abs_avg=32.21147918701172, test_abs_avg=32.22510528564453
production_forward grad[24] vs paper_forward: mean_abs=0.9428016543388367, max_abs=7.0, mean_rel=0.15350931882858276, max_rel=984.1834106445312, norm_rel=0.023569198325276375, ref_abs_avg=40.265785217285156, test_abs_avg=40.266658782958984
production_forward grad[25] vs paper_forward: mean_abs=0.8636548519134521, max_abs=5.5, mean_rel=0.2587338089942932, max_rel=2874.999755859375, norm_rel=0.021627215668559074, ref_abs_avg=40.1478271484375, test_abs_avg=40.153507232666016
production_forward grad[26] vs paper_forward: mean_abs=0.8611125946044922, max_abs=3.25, mean_rel=0.17637789249420166, max_rel=44.40983200073242, norm_rel=0.02486223168671131, ref_abs_avg=34.599708557128906, test_abs_avg=34.676963806152344
production_forward grad[27] vs paper_forward: mean_abs=1.1069390773773193, max_abs=7.3515625, mean_rel=0.1787773072719574, max_rel=1597.5428466796875, norm_rel=0.025431375950574875, ref_abs_avg=43.76228713989258, test_abs_avg=43.76323699951172
production_forward grad[28] vs paper_forward: mean_abs=1.0221340656280518, max_abs=6.0, mean_rel=0.37479931116104126, max_rel=3499.999755859375, norm_rel=0.023845532909035683, ref_abs_avg=43.04057312011719, test_abs_avg=43.05241775512695
production_forward grad[29] vs paper_forward: mean_abs=0.8396158218383789, max_abs=3.75, mean_rel=0.11097751557826996, max_rel=6.206326007843018, norm_rel=0.025917481631040573, ref_abs_avg=31.86901092529297, test_abs_avg=31.98552703857422
production_forward grad[30] vs paper_forward: mean_abs=1.0189509391784668, max_abs=7.0, mean_rel=0.16873076558113098, max_rel=1642.9068603515625, norm_rel=0.025670763105154037, ref_abs_avg=39.86585998535156, test_abs_avg=39.867340087890625
production_forward grad[31] vs paper_forward: mean_abs=0.9449675679206848, max_abs=6.0, mean_rel=0.3009238839149475, max_rel=3124.999755859375, norm_rel=0.024264030158519745, ref_abs_avg=39.160133361816406, test_abs_avg=39.161231994628906
production_forward grad[32] vs paper_forward: mean_abs=0.7429039478302002, max_abs=3.0, mean_rel=0.19312244653701782, max_rel=40.179405212402344, norm_rel=0.025258654728531837, ref_abs_avg=30.129499435424805, test_abs_avg=30.148435592651367
production_forward grad[33] vs paper_forward: mean_abs=0.9358169436454773, max_abs=6.5, mean_rel=0.1653681993484497, max_rel=1411.955078125, norm_rel=0.02549894154071808, ref_abs_avg=36.85504150390625, test_abs_avg=36.855621337890625
production_forward grad[34] vs paper_forward: mean_abs=0.8686946630477905, max_abs=5.5, mean_rel=0.3022899627685547, max_rel=2796.874755859375, norm_rel=0.023978305980563164, ref_abs_avg=36.38905715942383, test_abs_avg=36.3893928527832
production_forward grad[35] vs paper_forward: mean_abs=0.7178869247436523, max_abs=2.75, mean_rel=0.13978509604930878, max_rel=15.566601753234863, norm_rel=0.024263840168714523, ref_abs_avg=29.148338317871094, test_abs_avg=29.23172950744629
production_forward grad[36] vs paper_forward: mean_abs=0.8721818327903748, max_abs=6.0, mean_rel=0.16424931585788727, max_rel=1781.2498779296875, norm_rel=0.025211911648511887, ref_abs_avg=34.73957443237305, test_abs_avg=34.73963928222656
production_forward grad[37] vs paper_forward: mean_abs=0.8176021575927734, max_abs=5.0, mean_rel=0.3415539860725403, max_rel=2765.624755859375, norm_rel=0.02386891283094883, ref_abs_avg=34.361305236816406, test_abs_avg=34.36894989013672
production_forward grad[38] vs paper_forward: mean_abs=0.6597189903259277, max_abs=3.0, mean_rel=0.08497805893421173, max_rel=5.065256118774414, norm_rel=0.024234065786004066, ref_abs_avg=27.240062713623047, test_abs_avg=27.20521354675293
production_forward grad[39] vs paper_forward: mean_abs=0.820478081703186, max_abs=6.0, mean_rel=0.17517031729221344, max_rel=2112.134521484375, norm_rel=0.024945829063653946, ref_abs_avg=33.037872314453125, test_abs_avg=33.041748046875
production_forward grad[40] vs paper_forward: mean_abs=0.7688724994659424, max_abs=4.75, mean_rel=0.287550687789917, max_rel=2093.75, norm_rel=0.02357896789908409, ref_abs_avg=32.68537521362305, test_abs_avg=32.69076919555664
production_forward grad[41] vs paper_forward: mean_abs=0.5996084213256836, max_abs=2.6875, mean_rel=0.09897983074188232, max_rel=10.975088119506836, norm_rel=0.02301330491900444, ref_abs_avg=25.85531234741211, test_abs_avg=25.90089225769043
production_forward grad[42] vs paper_forward: mean_abs=0.7864789962768555, max_abs=5.25, mean_rel=0.15426817536354065, max_rel=1042.763427734375, norm_rel=0.024744749069213867, ref_abs_avg=31.90638542175293, test_abs_avg=31.90713882446289
production_forward grad[43] vs paper_forward: mean_abs=0.7268680334091187, max_abs=4.25, mean_rel=0.22773152589797974, max_rel=2328.125, norm_rel=0.022996140643954277, ref_abs_avg=31.695571899414062, test_abs_avg=31.697603225708008
production_forward grad[44] vs paper_forward: mean_abs=0.5784702301025391, max_abs=2.15625, mean_rel=0.06968170404434204, max_rel=7.02699089050293, norm_rel=0.02246478945016861, ref_abs_avg=26.600418090820312, test_abs_avg=26.625524520874023
production_forward grad[45] vs paper_forward: mean_abs=0.7498276233673096, max_abs=5.0, mean_rel=0.1629452407360077, max_rel=739.204345703125, norm_rel=0.024426843971014023, ref_abs_avg=30.825637817382812, test_abs_avg=30.829425811767578
production_forward grad[46] vs paper_forward: mean_abs=0.6928145885467529, max_abs=4.625, mean_rel=0.25994303822517395, max_rel=2046.8748779296875, norm_rel=0.022868502885103226, ref_abs_avg=30.375349044799805, test_abs_avg=30.381025314331055
production_forward grad[47] vs paper_forward: mean_abs=0.5993366241455078, max_abs=2.5, mean_rel=0.1268533319234848, max_rel=13.248059272766113, norm_rel=0.024747209623456, ref_abs_avg=23.733657836914062, test_abs_avg=23.72007179260254
production_forward grad[48] vs paper_forward: mean_abs=0.7156991362571716, max_abs=5.5, mean_rel=0.15911340713500977, max_rel=1534.259033203125, norm_rel=0.024515923112630844, ref_abs_avg=29.31985855102539, test_abs_avg=29.320266723632812
production_forward grad[49] vs paper_forward: mean_abs=0.6645218133926392, max_abs=4.5, mean_rel=0.295804888010025, max_rel=2999.999755859375, norm_rel=0.022992664948105812, ref_abs_avg=28.959705352783203, test_abs_avg=28.96364974975586
production_forward grad[50] vs paper_forward: mean_abs=0.6299839019775391, max_abs=3.0, mean_rel=0.0766175240278244, max_rel=4.288547992706299, norm_rel=0.02280295453965664, ref_abs_avg=27.698610305786133, test_abs_avg=27.706409454345703
production_forward grad[51] vs paper_forward: mean_abs=0.8169194459915161, max_abs=6.0, mean_rel=0.17951367795467377, max_rel=2708.5703125, norm_rel=0.025728631764650345, ref_abs_avg=31.853595733642578, test_abs_avg=31.853858947753906
production_forward grad[52] vs paper_forward: mean_abs=0.7600614428520203, max_abs=5.25, mean_rel=0.257151335477829, max_rel=1937.4998779296875, norm_rel=0.024361014366149902, ref_abs_avg=31.298931121826172, test_abs_avg=31.299198150634766
production_forward grad[53] vs paper_forward: mean_abs=0.5825588703155518, max_abs=2.375, mean_rel=0.18777644634246826, max_rel=29.071075439453125, norm_rel=0.023476798087358475, ref_abs_avg=24.943328857421875, test_abs_avg=24.933591842651367
production_forward grad[54] vs paper_forward: mean_abs=0.7455097436904907, max_abs=5.25, mean_rel=0.16192248463630676, max_rel=1066.8984375, norm_rel=0.025485552847385406, ref_abs_avg=29.3315486907959, test_abs_avg=29.330503463745117
production_forward grad[55] vs paper_forward: mean_abs=0.6946426630020142, max_abs=4.5, mean_rel=0.2549383044242859, max_rel=2999.999755859375, norm_rel=0.02390996553003788, ref_abs_avg=29.098894119262695, test_abs_avg=29.100770950317383
production_forward grad[56] vs paper_forward: mean_abs=0.5296551585197449, max_abs=2.1875, mean_rel=0.6764824986457825, max_rel=295.7084655761719, norm_rel=0.022595779970288277, ref_abs_avg=23.756515502929688, test_abs_avg=23.782791137695312
production_forward grad[57] vs paper_forward: mean_abs=0.6954368352890015, max_abs=4.6171875, mean_rel=0.1506347954273224, max_rel=620.3104858398438, norm_rel=0.024914776906371117, ref_abs_avg=27.99838638305664, test_abs_avg=27.998584747314453
production_forward grad[58] vs paper_forward: mean_abs=0.6435440182685852, max_abs=4.0, mean_rel=0.22108350694179535, max_rel=2062.5, norm_rel=0.02323959767818451, ref_abs_avg=27.70272445678711, test_abs_avg=27.69784927368164
production_forward grad[59] vs paper_forward: mean_abs=0.5105729103088379, max_abs=2.0, mean_rel=0.09846585988998413, max_rel=7.207757472991943, norm_rel=0.02350657805800438, ref_abs_avg=22.490476608276367, test_abs_avg=22.57147216796875
production_forward grad[60] vs paper_forward: mean_abs=0.655887246131897, max_abs=6.0, mean_rel=0.15836197137832642, max_rel=873.6090087890625, norm_rel=0.024569246917963028, ref_abs_avg=26.725364685058594, test_abs_avg=26.724897384643555
production_forward grad[61] vs paper_forward: mean_abs=0.6035771369934082, max_abs=3.625, mean_rel=0.23585177958011627, max_rel=1624.9998779296875, norm_rel=0.023245856165885925, ref_abs_avg=26.035396575927734, test_abs_avg=26.0344295501709
production_forward grad[62] vs paper_forward: mean_abs=0.46455806493759155, max_abs=2.21484375, mean_rel=0.2135365903377533, max_rel=38.86530685424805, norm_rel=0.02269836701452732, ref_abs_avg=21.017480850219727, test_abs_avg=21.011913299560547
production_forward grad[63] vs paper_forward: mean_abs=0.6117106676101685, max_abs=4.75, mean_rel=0.15737250447273254, max_rel=1274.0281982421875, norm_rel=0.024192266166210175, ref_abs_avg=25.33633804321289, test_abs_avg=25.33696174621582
production_forward grad[64] vs paper_forward: mean_abs=0.571848452091217, max_abs=4.25, mean_rel=0.2677399516105652, max_rel=2062.5, norm_rel=0.022676125168800354, ref_abs_avg=25.24350929260254, test_abs_avg=25.247047424316406
production_forward grad[65] vs paper_forward: mean_abs=0.44769924879074097, max_abs=1.75, mean_rel=0.11184124648571014, max_rel=17.5877685546875, norm_rel=0.022599495947360992, ref_abs_avg=19.87627410888672, test_abs_avg=19.878864288330078
production_forward grad[66] vs paper_forward: mean_abs=0.575779914855957, max_abs=4.5, mean_rel=0.1509983390569687, max_rel=899.8472900390625, norm_rel=0.023787761107087135, ref_abs_avg=24.22720718383789, test_abs_avg=24.227108001708984
production_forward grad[67] vs paper_forward: mean_abs=0.5351546406745911, max_abs=3.5, mean_rel=0.20845156908035278, max_rel=1687.4998779296875, norm_rel=0.022254839539527893, ref_abs_avg=24.0753116607666, test_abs_avg=24.06705665588379
production_forward grad[68] vs paper_forward: mean_abs=0.44024014472961426, max_abs=1.6875, mean_rel=0.09916800260543823, max_rel=21.03569984436035, norm_rel=0.02273874543607235, ref_abs_avg=19.608596801757812, test_abs_avg=19.596824645996094
production_forward grad[69] vs paper_forward: mean_abs=0.545862078666687, max_abs=4.5, mean_rel=0.14423441886901855, max_rel=640.0526733398438, norm_rel=0.023300331085920334, ref_abs_avg=23.459110260009766, test_abs_avg=23.45794677734375
production_forward grad[70] vs paper_forward: mean_abs=0.5076391696929932, max_abs=3.75, mean_rel=0.2432079315185547, max_rel=1843.7498779296875, norm_rel=0.02173462137579918, ref_abs_avg=23.3150634765625, test_abs_avg=23.323680877685547
production_forward grad[71] vs paper_forward: mean_abs=0.40334010124206543, max_abs=1.625, mean_rel=0.07137727737426758, max_rel=5.575847625732422, norm_rel=0.021479258313775063, ref_abs_avg=19.372787475585938, test_abs_avg=19.391393661499023
production_forward grad[72] vs paper_forward: mean_abs=0.5224574208259583, max_abs=4.5, mean_rel=0.1474055051803589, max_rel=885.2582397460938, norm_rel=0.023011358454823494, ref_abs_avg=22.716182708740234, test_abs_avg=22.717512130737305
production_forward grad[73] vs paper_forward: mean_abs=0.4826923608779907, max_abs=3.28125, mean_rel=0.2218402624130249, max_rel=1296.8748779296875, norm_rel=0.02143637090921402, ref_abs_avg=22.518728256225586, test_abs_avg=22.522605895996094
production_forward grad[74] vs paper_forward: mean_abs=0.44571375846862793, max_abs=2.03125, mean_rel=0.1300632208585739, max_rel=23.277584075927734, norm_rel=0.02220657281577587, ref_abs_avg=20.40837287902832, test_abs_avg=20.425106048583984
production_forward grad[75] vs paper_forward: mean_abs=0.5677164793014526, max_abs=5.0, mean_rel=0.15734249353408813, max_rel=906.7312622070312, norm_rel=0.02436807006597519, ref_abs_avg=23.374399185180664, test_abs_avg=23.37647819519043
production_forward grad[76] vs paper_forward: mean_abs=0.525880753993988, max_abs=4.0, mean_rel=0.2838559150695801, max_rel=1882.8123779296875, norm_rel=0.022691944614052773, ref_abs_avg=23.21087074279785, test_abs_avg=23.214237213134766
production_forward grad[77] vs paper_forward: mean_abs=0.407109797000885, max_abs=1.96875, mean_rel=0.19996309280395508, max_rel=39.39223098754883, norm_rel=0.021604076027870178, ref_abs_avg=18.938186645507812, test_abs_avg=18.97374153137207
production_forward grad[78] vs paper_forward: mean_abs=0.5337204933166504, max_abs=5.25, mean_rel=0.14944647252559662, max_rel=1199.5548095703125, norm_rel=0.023769931867718697, ref_abs_avg=22.45486831665039, test_abs_avg=22.456188201904297
production_forward grad[79] vs paper_forward: mean_abs=0.4883383810520172, max_abs=3.5, mean_rel=0.2020980715751648, max_rel=2062.5, norm_rel=0.022205697372555733, ref_abs_avg=22.030048370361328, test_abs_avg=22.045757293701172
production_forward grad[80] vs paper_forward: mean_abs=0.3712242841720581, max_abs=1.46875, mean_rel=0.19196327030658722, max_rel=66.90557098388672, norm_rel=0.02072126232087612, ref_abs_avg=18.35239028930664, test_abs_avg=18.337129592895508
production_forward grad[81] vs paper_forward: mean_abs=0.49408891797065735, max_abs=4.5, mean_rel=0.15026932954788208, max_rel=768.6318969726562, norm_rel=0.023625437170267105, ref_abs_avg=20.979209899902344, test_abs_avg=20.980819702148438
production_forward grad[82] vs paper_forward: mean_abs=0.45477020740509033, max_abs=3.375, mean_rel=0.21095502376556396, max_rel=1374.9998779296875, norm_rel=0.02246205136179924, ref_abs_avg=20.310951232910156, test_abs_avg=20.31307601928711
production_forward grad[83] vs paper_forward: mean_abs=0.3718729019165039, max_abs=2.09375, mean_rel=0.0741443783044815, max_rel=4.621400356292725, norm_rel=0.022798284888267517, ref_abs_avg=16.32668685913086, test_abs_avg=16.31395721435547
production_forward grad[84] vs paper_forward: mean_abs=0.4614013433456421, max_abs=4.75, mean_rel=0.13822472095489502, max_rel=981.90234375, norm_rel=0.022734731435775757, ref_abs_avg=20.36647605895996, test_abs_avg=20.367225646972656
production_forward grad[85] vs paper_forward: mean_abs=0.41527193784713745, max_abs=3.5, mean_rel=0.20878344774246216, max_rel=1453.1248779296875, norm_rel=0.02091342583298683, ref_abs_avg=19.8358154296875, test_abs_avg=19.83414077758789
production_forward grad[86] vs paper_forward: mean_abs=0.3162655830383301, max_abs=1.1875, mean_rel=0.0676196962594986, max_rel=2.6755058765411377, norm_rel=0.01977410539984703, ref_abs_avg=15.952723503112793, test_abs_avg=15.948617935180664
production_forward grad[87] vs paper_forward: mean_abs=0.43499845266342163, max_abs=4.75, mean_rel=0.14265286922454834, max_rel=847.2681884765625, norm_rel=0.022417154163122177, ref_abs_avg=19.52667999267578, test_abs_avg=19.52743148803711
production_forward grad[88] vs paper_forward: mean_abs=0.3967609703540802, max_abs=3.5, mean_rel=0.18239206075668335, max_rel=1499.9998779296875, norm_rel=0.020325221121311188, ref_abs_avg=19.54051971435547, test_abs_avg=19.54108428955078
production_forward grad[89] vs paper_forward: mean_abs=0.32555070519447327, max_abs=1.552734375, mean_rel=0.3367338180541992, max_rel=135.13067626953125, norm_rel=0.02128916047513485, ref_abs_avg=15.598333358764648, test_abs_avg=15.592058181762695
production_forward grad[90] vs paper_forward: mean_abs=0.40474313497543335, max_abs=4.65625, mean_rel=0.13605904579162598, max_rel=605.9825439453125, norm_rel=0.021925203502178192, ref_abs_avg=18.595012664794922, test_abs_avg=18.595481872558594
production_forward grad[91] vs paper_forward: mean_abs=0.36872124671936035, max_abs=4.0, mean_rel=0.16477052867412567, max_rel=968.7499389648438, norm_rel=0.02046082727611065, ref_abs_avg=18.26932716369629, test_abs_avg=18.26389503479004
production_forward grad[92] vs paper_forward: mean_abs=0.31535983085632324, max_abs=1.25, mean_rel=0.09613369405269623, max_rel=3.8203701972961426, norm_rel=0.019977547228336334, ref_abs_avg=15.705026626586914, test_abs_avg=15.71048641204834
production_forward grad[93] vs paper_forward: mean_abs=0.39075565338134766, max_abs=4.75, mean_rel=0.12338507175445557, max_rel=469.1575012207031, norm_rel=0.02160273678600788, ref_abs_avg=18.29189682006836, test_abs_avg=18.292158126831055
production_forward grad[94] vs paper_forward: mean_abs=0.35021501779556274, max_abs=3.5, mean_rel=0.17179185152053833, max_rel=999.9999389648438, norm_rel=0.01961233653128147, ref_abs_avg=18.114578247070312, test_abs_avg=18.119138717651367
production_forward grad[95] vs paper_forward: mean_abs=0.28344061970710754, max_abs=1.25, mean_rel=0.21843037009239197, max_rel=69.2675552368164, norm_rel=0.02032732032239437, ref_abs_avg=14.120006561279297, test_abs_avg=14.121171951293945
production_forward grad[96] vs paper_forward: mean_abs=0.3628358244895935, max_abs=4.0, mean_rel=0.1305147409439087, max_rel=1157.9580078125, norm_rel=0.020963219925761223, ref_abs_avg=17.59404754638672, test_abs_avg=17.595088958740234
production_forward grad[97] vs paper_forward: mean_abs=0.3287043273448944, max_abs=3.0, mean_rel=0.19217319786548615, max_rel=2125.0, norm_rel=0.019281111657619476, ref_abs_avg=17.456478118896484, test_abs_avg=17.456960678100586
torch_compile_phases_forward vs paper_forward output: mean_abs=0.001648015109822154, max_abs=0.046875
torch_compile_phases_forward grad[0] vs paper_forward: mean_abs=0.008788324892520905, max_abs=0.32421875, mean_rel=0.07553820312023163, max_rel=97.1438217163086, norm_rel=0.02066374570131302, ref_abs_avg=0.45862895250320435, test_abs_avg=0.4586341381072998
torch_compile_phases_forward grad[1] vs paper_forward: mean_abs=7.39129114151001, max_abs=56.0, mean_rel=0.1618293821811676, max_rel=251.23391723632812, norm_rel=0.020433999598026276, ref_abs_avg=319.58984375, test_abs_avg=319.64776611328125
torch_compile_phases_forward grad[2] vs paper_forward: mean_abs=1.3116350173950195, max_abs=4.5, mean_rel=0.1009141355752945, max_rel=6.674924373626709, norm_rel=0.02294229343533516, ref_abs_avg=57.5374755859375, test_abs_avg=57.47115707397461
torch_compile_phases_forward grad[3] vs paper_forward: mean_abs=1.6761846542358398, max_abs=12.0, mean_rel=0.1872352808713913, max_rel=2120.450927734375, norm_rel=0.025250915437936783, ref_abs_avg=66.72473907470703, test_abs_avg=66.7283706665039
torch_compile_phases_forward grad[4] vs paper_forward: mean_abs=1.5478436946868896, max_abs=10.15625, mean_rel=0.48812466859817505, max_rel=5249.99951171875, norm_rel=0.023698851466178894, ref_abs_avg=65.64997863769531, test_abs_avg=65.63077545166016
torch_compile_phases_forward grad[5] vs paper_forward: mean_abs=1.0748776197433472, max_abs=4.375, mean_rel=0.12581881880760193, max_rel=10.153935432434082, norm_rel=0.021664168685674667, ref_abs_avg=51.40839385986328, test_abs_avg=51.4614372253418
torch_compile_phases_forward grad[6] vs paper_forward: mean_abs=1.4679088592529297, max_abs=11.0, mean_rel=0.16667380928993225, max_rel=2318.7490234375, norm_rel=0.024974724277853966, ref_abs_avg=59.105438232421875, test_abs_avg=59.11064529418945
torch_compile_phases_forward grad[7] vs paper_forward: mean_abs=1.3511567115783691, max_abs=9.5, mean_rel=0.41837769746780396, max_rel=3749.999755859375, norm_rel=0.02332448586821556, ref_abs_avg=58.19281005859375, test_abs_avg=58.19601821899414
torch_compile_phases_forward grad[8] vs paper_forward: mean_abs=0.9899144172668457, max_abs=4.0, mean_rel=0.12577755749225616, max_rel=17.096906661987305, norm_rel=0.022381305694580078, ref_abs_avg=45.517913818359375, test_abs_avg=45.577449798583984
torch_compile_phases_forward grad[9] vs paper_forward: mean_abs=1.3342866897583008, max_abs=10.0, mean_rel=0.17135906219482422, max_rel=1756.8853759765625, norm_rel=0.024743305519223213, ref_abs_avg=54.241798400878906, test_abs_avg=54.24354553222656
torch_compile_phases_forward grad[10] vs paper_forward: mean_abs=1.2280521392822266, max_abs=8.5, mean_rel=0.38947826623916626, max_rel=3999.999755859375, norm_rel=0.023142583668231964, ref_abs_avg=53.372596740722656, test_abs_avg=53.363765716552734
torch_compile_phases_forward grad[11] vs paper_forward: mean_abs=0.937389612197876, max_abs=4.0, mean_rel=0.10257954895496368, max_rel=6.365262985229492, norm_rel=0.023456966504454613, ref_abs_avg=39.64055252075195, test_abs_avg=39.64796447753906
torch_compile_phases_forward grad[12] vs paper_forward: mean_abs=1.2204580307006836, max_abs=9.0, mean_rel=0.1605377197265625, max_rel=1577.67529296875, norm_rel=0.024522317573428154, ref_abs_avg=50.06644821166992, test_abs_avg=50.07250213623047
torch_compile_phases_forward grad[13] vs paper_forward: mean_abs=1.1245683431625366, max_abs=6.25, mean_rel=0.3153232932090759, max_rel=4250.0, norm_rel=0.023082109168171883, ref_abs_avg=49.023536682128906, test_abs_avg=49.03023910522461
torch_compile_phases_forward grad[14] vs paper_forward: mean_abs=0.8617439270019531, max_abs=3.1875, mean_rel=0.09862968325614929, max_rel=9.083514213562012, norm_rel=0.022283557802438736, ref_abs_avg=40.03599548339844, test_abs_avg=39.996253967285156
torch_compile_phases_forward grad[15] vs paper_forward: mean_abs=1.1359038352966309, max_abs=9.0, mean_rel=0.17034907639026642, max_rel=2756.9140625, norm_rel=0.02446284517645836, ref_abs_avg=46.669776916503906, test_abs_avg=46.674041748046875
torch_compile_phases_forward grad[16] vs paper_forward: mean_abs=1.0522308349609375, max_abs=7.0, mean_rel=0.29612410068511963, max_rel=2656.249755859375, norm_rel=0.022909263148903847, ref_abs_avg=46.14007568359375, test_abs_avg=46.14561462402344
torch_compile_phases_forward grad[17] vs paper_forward: mean_abs=0.8071861267089844, max_abs=2.75, mean_rel=0.080401711165905, max_rel=3.8841402530670166, norm_rel=0.02176818437874317, ref_abs_avg=36.9268684387207, test_abs_avg=36.89405822753906
torch_compile_phases_forward grad[18] vs paper_forward: mean_abs=1.0715076923370361, max_abs=8.0, mean_rel=0.15600033104419708, max_rel=1564.687744140625, norm_rel=0.024282952770590782, ref_abs_avg=44.39404296875, test_abs_avg=44.39265441894531
torch_compile_phases_forward grad[19] vs paper_forward: mean_abs=0.9832800626754761, max_abs=6.125, mean_rel=0.24706262350082397, max_rel=3249.999755859375, norm_rel=0.022610565647482872, ref_abs_avg=43.630287170410156, test_abs_avg=43.62976837158203
torch_compile_phases_forward grad[20] vs paper_forward: mean_abs=0.7841928005218506, max_abs=2.90625, mean_rel=0.10022692382335663, max_rel=11.084853172302246, norm_rel=0.022216128185391426, ref_abs_avg=36.0401496887207, test_abs_avg=36.04876708984375
torch_compile_phases_forward grad[21] vs paper_forward: mean_abs=1.0144768953323364, max_abs=6.5, mean_rel=0.16173449158668518, max_rel=912.6556396484375, norm_rel=0.024171246215701103, ref_abs_avg=42.223487854003906, test_abs_avg=42.22553253173828
torch_compile_phases_forward grad[22] vs paper_forward: mean_abs=0.9325498342514038, max_abs=5.875, mean_rel=0.31966453790664673, max_rel=3515.624755859375, norm_rel=0.022399358451366425, ref_abs_avg=41.80931854248047, test_abs_avg=41.81329345703125
torch_compile_phases_forward grad[23] vs paper_forward: mean_abs=0.7348452806472778, max_abs=3.125, mean_rel=0.10257074981927872, max_rel=13.947447776794434, norm_rel=0.022521547973155975, ref_abs_avg=32.21147918701172, test_abs_avg=32.2094841003418
torch_compile_phases_forward grad[24] vs paper_forward: mean_abs=0.9629726409912109, max_abs=7.0, mean_rel=0.16344232857227325, max_rel=1438.543212890625, norm_rel=0.02406460791826248, ref_abs_avg=40.265785217285156, test_abs_avg=40.26570129394531
torch_compile_phases_forward grad[25] vs paper_forward: mean_abs=0.8840621113777161, max_abs=6.5, mean_rel=0.2669689953327179, max_rel=2937.499755859375, norm_rel=0.022138426080346107, ref_abs_avg=40.1478271484375, test_abs_avg=40.14824676513672
torch_compile_phases_forward grad[26] vs paper_forward: mean_abs=0.8710384368896484, max_abs=4.0, mean_rel=0.17899778485298157, max_rel=47.93675231933594, norm_rel=0.025139737874269485, ref_abs_avg=34.599708557128906, test_abs_avg=34.66582489013672
torch_compile_phases_forward grad[27] vs paper_forward: mean_abs=1.1325793266296387, max_abs=8.0, mean_rel=0.17903661727905273, max_rel=1657.9669189453125, norm_rel=0.026004990562796593, ref_abs_avg=43.76228713989258, test_abs_avg=43.761810302734375
torch_compile_phases_forward grad[28] vs paper_forward: mean_abs=1.0474412441253662, max_abs=7.0, mean_rel=0.3428044319152832, max_rel=3749.999755859375, norm_rel=0.024439072236418724, ref_abs_avg=43.04057312011719, test_abs_avg=43.04554748535156
torch_compile_phases_forward grad[29] vs paper_forward: mean_abs=0.8455071449279785, max_abs=4.0, mean_rel=0.11608415096998215, max_rel=7.975918292999268, norm_rel=0.026259131729602814, ref_abs_avg=31.86901092529297, test_abs_avg=31.977048873901367
torch_compile_phases_forward grad[30] vs paper_forward: mean_abs=1.039879560470581, max_abs=7.0, mean_rel=0.1704941689968109, max_rel=2303.13330078125, norm_rel=0.026191573590040207, ref_abs_avg=39.86585998535156, test_abs_avg=39.86663055419922
torch_compile_phases_forward grad[31] vs paper_forward: mean_abs=0.9661581516265869, max_abs=6.0, mean_rel=0.3183160722255707, max_rel=2999.999755859375, norm_rel=0.02481093257665634, ref_abs_avg=39.160133361816406, test_abs_avg=39.16007995605469
torch_compile_phases_forward grad[32] vs paper_forward: mean_abs=0.7544195652008057, max_abs=3.0, mean_rel=0.18349523842334747, max_rel=22.732454299926758, norm_rel=0.025876484811306, ref_abs_avg=30.129499435424805, test_abs_avg=30.11569595336914
torch_compile_phases_forward grad[33] vs paper_forward: mean_abs=0.9541759490966797, max_abs=7.0, mean_rel=0.16929873824119568, max_rel=1809.8193359375, norm_rel=0.02597673051059246, ref_abs_avg=36.85504150390625, test_abs_avg=36.85620880126953
torch_compile_phases_forward grad[34] vs paper_forward: mean_abs=0.889615535736084, max_abs=5.25, mean_rel=0.284615159034729, max_rel=2234.375, norm_rel=0.02455109916627407, ref_abs_avg=36.38905715942383, test_abs_avg=36.38955307006836
torch_compile_phases_forward grad[35] vs paper_forward: mean_abs=0.72210693359375, max_abs=2.75, mean_rel=0.1465844362974167, max_rel=11.654302597045898, norm_rel=0.025098752230405807, ref_abs_avg=29.148338317871094, test_abs_avg=29.252819061279297
torch_compile_phases_forward grad[36] vs paper_forward: mean_abs=0.8885257244110107, max_abs=6.0, mean_rel=0.1684853434562683, max_rel=1404.3902587890625, norm_rel=0.025665998458862305, ref_abs_avg=34.73957443237305, test_abs_avg=34.739139556884766
torch_compile_phases_forward grad[37] vs paper_forward: mean_abs=0.8346063494682312, max_abs=5.0, mean_rel=0.32351765036582947, max_rel=2843.749755859375, norm_rel=0.02433943748474121, ref_abs_avg=34.361305236816406, test_abs_avg=34.36820983886719
torch_compile_phases_forward grad[38] vs paper_forward: mean_abs=0.675351619720459, max_abs=3.375, mean_rel=0.1310405731201172, max_rel=26.307823181152344, norm_rel=0.024502504616975784, ref_abs_avg=27.240062713623047, test_abs_avg=27.22161293029785
torch_compile_phases_forward grad[39] vs paper_forward: mean_abs=0.8329756259918213, max_abs=6.0, mean_rel=0.17453598976135254, max_rel=2211.538818359375, norm_rel=0.025309910997748375, ref_abs_avg=33.037872314453125, test_abs_avg=33.04058837890625
torch_compile_phases_forward grad[40] vs paper_forward: mean_abs=0.7800535559654236, max_abs=4.75, mean_rel=0.28043460845947266, max_rel=2406.25, norm_rel=0.02390250936150551, ref_abs_avg=32.68537521362305, test_abs_avg=32.68944549560547
torch_compile_phases_forward grad[41] vs paper_forward: mean_abs=0.615656852722168, max_abs=2.9375, mean_rel=0.09937770664691925, max_rel=8.47173023223877, norm_rel=0.02394501306116581, ref_abs_avg=25.85531234741211, test_abs_avg=25.886648178100586
torch_compile_phases_forward grad[42] vs paper_forward: mean_abs=0.7986364364624023, max_abs=5.0, mean_rel=0.15795351564884186, max_rel=1216.527587890625, norm_rel=0.02509431540966034, ref_abs_avg=31.90638542175293, test_abs_avg=31.90732192993164
torch_compile_phases_forward grad[43] vs paper_forward: mean_abs=0.7393842935562134, max_abs=4.5, mean_rel=0.2295708954334259, max_rel=1953.1248779296875, norm_rel=0.023383624851703644, ref_abs_avg=31.695571899414062, test_abs_avg=31.699588775634766
torch_compile_phases_forward grad[44] vs paper_forward: mean_abs=0.5604209899902344, max_abs=2.375, mean_rel=0.07157621532678604, max_rel=6.088569164276123, norm_rel=0.02186121977865696, ref_abs_avg=26.600418090820312, test_abs_avg=26.63553237915039
torch_compile_phases_forward grad[45] vs paper_forward: mean_abs=0.7597352266311646, max_abs=5.0, mean_rel=0.1637580692768097, max_rel=852.0859375, norm_rel=0.024732491001486778, ref_abs_avg=30.825637817382812, test_abs_avg=30.826560974121094
torch_compile_phases_forward grad[46] vs paper_forward: mean_abs=0.7017775177955627, max_abs=4.25, mean_rel=0.27351313829421997, max_rel=1937.4998779296875, norm_rel=0.02318081073462963, ref_abs_avg=30.375349044799805, test_abs_avg=30.377750396728516
torch_compile_phases_forward grad[47] vs paper_forward: mean_abs=0.6086664199829102, max_abs=2.125, mean_rel=0.12532871961593628, max_rel=17.30242156982422, norm_rel=0.02525186352431774, ref_abs_avg=23.733657836914062, test_abs_avg=23.71814727783203
torch_compile_phases_forward grad[48] vs paper_forward: mean_abs=0.7245894074440002, max_abs=5.0, mean_rel=0.16168330609798431, max_rel=1893.3721923828125, norm_rel=0.024824704974889755, ref_abs_avg=29.31985855102539, test_abs_avg=29.320476531982422
torch_compile_phases_forward grad[49] vs paper_forward: mean_abs=0.6735522150993347, max_abs=4.125, mean_rel=0.30725088715553284, max_rel=2593.749755859375, norm_rel=0.023302234709262848, ref_abs_avg=28.959705352783203, test_abs_avg=28.962982177734375
torch_compile_phases_forward grad[50] vs paper_forward: mean_abs=0.6192655563354492, max_abs=3.5, mean_rel=0.06946322321891785, max_rel=3.2948553562164307, norm_rel=0.023089438676834106, ref_abs_avg=27.698610305786133, test_abs_avg=27.650611877441406
torch_compile_phases_forward grad[51] vs paper_forward: mean_abs=0.8296769857406616, max_abs=5.5, mean_rel=0.18198749423027039, max_rel=2461.006591796875, norm_rel=0.026139046996831894, ref_abs_avg=31.853595733642578, test_abs_avg=31.85426139831543
torch_compile_phases_forward grad[52] vs paper_forward: mean_abs=0.7712140083312988, max_abs=5.25, mean_rel=0.2866182029247284, max_rel=2187.5, norm_rel=0.024750160053372383, ref_abs_avg=31.298931121826172, test_abs_avg=31.302268981933594
torch_compile_phases_forward grad[53] vs paper_forward: mean_abs=0.5949852466583252, max_abs=2.75, mean_rel=0.18852460384368896, max_rel=36.13014221191406, norm_rel=0.024063413962721825, ref_abs_avg=24.943328857421875, test_abs_avg=24.948701858520508
torch_compile_phases_forward grad[54] vs paper_forward: mean_abs=0.7554843425750732, max_abs=5.25, mean_rel=0.16290852427482605, max_rel=1240.4417724609375, norm_rel=0.025828374549746513, ref_abs_avg=29.3315486907959, test_abs_avg=29.330129623413086
torch_compile_phases_forward grad[55] vs paper_forward: mean_abs=0.7048234939575195, max_abs=4.71875, mean_rel=0.2509135603904724, max_rel=2812.499755859375, norm_rel=0.024248559027910233, ref_abs_avg=29.098894119262695, test_abs_avg=29.10171127319336
torch_compile_phases_forward grad[56] vs paper_forward: mean_abs=0.5465757250785828, max_abs=2.0625, mean_rel=0.36449047923088074, max_rel=131.2376251220703, norm_rel=0.023236826062202454, ref_abs_avg=23.756515502929688, test_abs_avg=23.782611846923828
torch_compile_phases_forward grad[57] vs paper_forward: mean_abs=0.7046440839767456, max_abs=5.25, mean_rel=0.1519429087638855, max_rel=835.3623657226562, norm_rel=0.02523192949593067, ref_abs_avg=27.99838638305664, test_abs_avg=27.998332977294922
torch_compile_phases_forward grad[58] vs paper_forward: mean_abs=0.6519714593887329, max_abs=4.5, mean_rel=0.23026692867279053, max_rel=1921.8748779296875, norm_rel=0.023550521582365036, ref_abs_avg=27.70272445678711, test_abs_avg=27.699066162109375
torch_compile_phases_forward grad[59] vs paper_forward: mean_abs=0.5263628959655762, max_abs=2.0, mean_rel=0.10855646431446075, max_rel=8.713730812072754, norm_rel=0.02414071187376976, ref_abs_avg=22.490476608276367, test_abs_avg=22.56168556213379
torch_compile_phases_forward grad[60] vs paper_forward: mean_abs=0.6631444692611694, max_abs=5.0, mean_rel=0.16254091262817383, max_rel=1389.9266357421875, norm_rel=0.024844588711857796, ref_abs_avg=26.725364685058594, test_abs_avg=26.725017547607422
torch_compile_phases_forward grad[61] vs paper_forward: mean_abs=0.6132424473762512, max_abs=4.111328125, mean_rel=0.2563115358352661, max_rel=1749.9998779296875, norm_rel=0.02360721305012703, ref_abs_avg=26.035396575927734, test_abs_avg=26.03386878967285
torch_compile_phases_forward grad[62] vs paper_forward: mean_abs=0.45703524351119995, max_abs=2.0, mean_rel=0.2947520613670349, max_rel=73.16302490234375, norm_rel=0.022289037704467773, ref_abs_avg=21.017480850219727, test_abs_avg=20.992366790771484
torch_compile_phases_forward grad[63] vs paper_forward: mean_abs=0.6180276274681091, max_abs=5.0, mean_rel=0.15733644366264343, max_rel=1484.1507568359375, norm_rel=0.024437548592686653, ref_abs_avg=25.33633804321289, test_abs_avg=25.33820343017578
torch_compile_phases_forward grad[64] vs paper_forward: mean_abs=0.5798672437667847, max_abs=4.5, mean_rel=0.25751376152038574, max_rel=1874.9998779296875, norm_rel=0.02296976000070572, ref_abs_avg=25.24350929260254, test_abs_avg=25.24893569946289
torch_compile_phases_forward grad[65] vs paper_forward: mean_abs=0.45262235403060913, max_abs=1.75, mean_rel=0.207240492105484, max_rel=65.40861511230469, norm_rel=0.02259194850921631, ref_abs_avg=19.87627410888672, test_abs_avg=19.888158798217773
torch_compile_phases_forward grad[66] vs paper_forward: mean_abs=0.5814621448516846, max_abs=4.5, mean_rel=0.15092965960502625, max_rel=778.9324340820312, norm_rel=0.02401644177734852, ref_abs_avg=24.22720718383789, test_abs_avg=24.22762107849121
torch_compile_phases_forward grad[67] vs paper_forward: mean_abs=0.5402849316596985, max_abs=4.0, mean_rel=0.22121703624725342, max_rel=1718.7498779296875, norm_rel=0.02246769145131111, ref_abs_avg=24.0753116607666, test_abs_avg=24.071462631225586
torch_compile_phases_forward grad[68] vs paper_forward: mean_abs=0.43665385246276855, max_abs=1.75, mean_rel=0.12357442826032639, max_rel=34.94300842285156, norm_rel=0.022617915645241737, ref_abs_avg=19.608596801757812, test_abs_avg=19.60019874572754
torch_compile_phases_forward grad[69] vs paper_forward: mean_abs=0.5509377121925354, max_abs=4.5, mean_rel=0.14376121759414673, max_rel=547.262939453125, norm_rel=0.02352033741772175, ref_abs_avg=23.459110260009766, test_abs_avg=23.458385467529297
torch_compile_phases_forward grad[70] vs paper_forward: mean_abs=0.5102211236953735, max_abs=3.34375, mean_rel=0.24280503392219543, max_rel=1812.4998779296875, norm_rel=0.02183871529996395, ref_abs_avg=23.3150634765625, test_abs_avg=23.32305908203125
torch_compile_phases_forward grad[71] vs paper_forward: mean_abs=0.40372180938720703, max_abs=1.625, mean_rel=0.060084354132413864, max_rel=4.7596211433410645, norm_rel=0.021523473784327507, ref_abs_avg=19.372787475585938, test_abs_avg=19.39461326599121
torch_compile_phases_forward grad[72] vs paper_forward: mean_abs=0.5265952944755554, max_abs=5.0, mean_rel=0.14939647912979126, max_rel=810.693359375, norm_rel=0.023177681490778923, ref_abs_avg=22.716182708740234, test_abs_avg=22.71697235107422
torch_compile_phases_forward grad[73] vs paper_forward: mean_abs=0.4878343343734741, max_abs=3.75, mean_rel=0.22915887832641602, max_rel=1718.7498779296875, norm_rel=0.02166999876499176, ref_abs_avg=22.518728256225586, test_abs_avg=22.522796630859375
torch_compile_phases_forward grad[74] vs paper_forward: mean_abs=0.4556589126586914, max_abs=1.625, mean_rel=0.1275496631860733, max_rel=19.408388137817383, norm_rel=0.022232089191675186, ref_abs_avg=20.40837287902832, test_abs_avg=20.437397003173828
torch_compile_phases_forward grad[75] vs paper_forward: mean_abs=0.5746386647224426, max_abs=4.5625, mean_rel=0.16569378972053528, max_rel=1046.1280517578125, norm_rel=0.02464979887008667, ref_abs_avg=23.374399185180664, test_abs_avg=23.376789093017578
torch_compile_phases_forward grad[76] vs paper_forward: mean_abs=0.5326632261276245, max_abs=3.5, mean_rel=0.27354419231414795, max_rel=1468.7498779296875, norm_rel=0.02298913709819317, ref_abs_avg=23.21087074279785, test_abs_avg=23.214078903198242
torch_compile_phases_forward grad[77] vs paper_forward: mean_abs=0.4170147180557251, max_abs=2.0, mean_rel=0.1343381702899933, max_rel=14.533924102783203, norm_rel=0.02200690656900406, ref_abs_avg=18.938186645507812, test_abs_avg=18.956008911132812
torch_compile_phases_forward grad[78] vs paper_forward: mean_abs=0.5385406017303467, max_abs=5.0, mean_rel=0.1532166302204132, max_rel=1481.4552001953125, norm_rel=0.023977095261216164, ref_abs_avg=22.45486831665039, test_abs_avg=22.455778121948242
torch_compile_phases_forward grad[79] vs paper_forward: mean_abs=0.492520272731781, max_abs=4.0, mean_rel=0.20357635617256165, max_rel=1781.2498779296875, norm_rel=0.02239588461816311, ref_abs_avg=22.030048370361328, test_abs_avg=22.043806076049805
torch_compile_phases_forward grad[80] vs paper_forward: mean_abs=0.38557279109954834, max_abs=1.5625, mean_rel=0.20392107963562012, max_rel=72.62834930419922, norm_rel=0.021349143236875534, ref_abs_avg=18.35239028930664, test_abs_avg=18.339750289916992
torch_compile_phases_forward grad[81] vs paper_forward: mean_abs=0.49840065836906433, max_abs=4.25, mean_rel=0.1535964012145996, max_rel=700.8267211914062, norm_rel=0.023806080222129822, ref_abs_avg=20.979209899902344, test_abs_avg=20.981294631958008
torch_compile_phases_forward grad[82] vs paper_forward: mean_abs=0.4556092619895935, max_abs=3.5, mean_rel=0.20770446956157684, max_rel=1624.9998779296875, norm_rel=0.022458961233496666, ref_abs_avg=20.310951232910156, test_abs_avg=20.312265396118164
torch_compile_phases_forward grad[83] vs paper_forward: mean_abs=0.3825392723083496, max_abs=1.53125, mean_rel=0.07704947143793106, max_rel=5.079176902770996, norm_rel=0.023430610075592995, ref_abs_avg=16.32668685913086, test_abs_avg=16.317758560180664
torch_compile_phases_forward grad[84] vs paper_forward: mean_abs=0.46482259035110474, max_abs=4.0, mean_rel=0.14094047248363495, max_rel=778.3856811523438, norm_rel=0.022909456863999367, ref_abs_avg=20.36647605895996, test_abs_avg=20.367061614990234
torch_compile_phases_forward grad[85] vs paper_forward: mean_abs=0.41929876804351807, max_abs=3.5, mean_rel=0.2104991376399994, max_rel=1874.9998779296875, norm_rel=0.02114182524383068, ref_abs_avg=19.8358154296875, test_abs_avg=19.835556030273438
torch_compile_phases_forward grad[86] vs paper_forward: mean_abs=0.3240971565246582, max_abs=1.25, mean_rel=0.08600867539644241, max_rel=5.801814079284668, norm_rel=0.020410289987921715, ref_abs_avg=15.952723503112793, test_abs_avg=15.948334693908691
torch_compile_phases_forward grad[87] vs paper_forward: mean_abs=0.43740811944007874, max_abs=4.0, mean_rel=0.1394105851650238, max_rel=886.429931640625, norm_rel=0.02251557447016239, ref_abs_avg=19.52667999267578, test_abs_avg=19.527435302734375
torch_compile_phases_forward grad[88] vs paper_forward: mean_abs=0.3997282087802887, max_abs=3.0, mean_rel=0.18256400525569916, max_rel=1374.9998779296875, norm_rel=0.020437875762581825, ref_abs_avg=19.54051971435547, test_abs_avg=19.53818702697754
torch_compile_phases_forward grad[89] vs paper_forward: mean_abs=0.32159343361854553, max_abs=1.3125, mean_rel=0.6075490117073059, max_rel=269.5559387207031, norm_rel=0.020712904632091522, ref_abs_avg=15.598333358764648, test_abs_avg=15.598876953125
torch_compile_phases_forward grad[90] vs paper_forward: mean_abs=0.40643781423568726, max_abs=4.625, mean_rel=0.13569018244743347, max_rel=751.7828369140625, norm_rel=0.02200782112777233, ref_abs_avg=18.595012664794922, test_abs_avg=18.59608268737793
torch_compile_phases_forward grad[91] vs paper_forward: mean_abs=0.37316492199897766, max_abs=4.25, mean_rel=0.1690014898777008, max_rel=1218.75, norm_rel=0.020720137283205986, ref_abs_avg=18.26932716369629, test_abs_avg=18.263822555541992
torch_compile_phases_forward grad[92] vs paper_forward: mean_abs=0.32005441188812256, max_abs=1.375, mean_rel=0.09653909504413605, max_rel=4.607128143310547, norm_rel=0.020474320277571678, ref_abs_avg=15.705026626586914, test_abs_avg=15.703071594238281
torch_compile_phases_forward grad[93] vs paper_forward: mean_abs=0.39198797941207886, max_abs=4.0, mean_rel=0.12459421157836914, max_rel=599.9960327148438, norm_rel=0.021657120436429977, ref_abs_avg=18.29189682006836, test_abs_avg=18.29184913635254
torch_compile_phases_forward grad[94] vs paper_forward: mean_abs=0.3524899482727051, max_abs=3.5, mean_rel=0.1773948222398758, max_rel=1187.5, norm_rel=0.01969284377992153, ref_abs_avg=18.114578247070312, test_abs_avg=18.114545822143555
torch_compile_phases_forward grad[95] vs paper_forward: mean_abs=0.2770901024341583, max_abs=1.25, mean_rel=0.22149714827537537, max_rel=69.68512725830078, norm_rel=0.019670706242322922, ref_abs_avg=14.120006561279297, test_abs_avg=14.12101936340332
torch_compile_phases_forward grad[96] vs paper_forward: mean_abs=0.36374691128730774, max_abs=3.75, mean_rel=0.13174059987068176, max_rel=1278.5548095703125, norm_rel=0.02098209224641323, ref_abs_avg=17.59404754638672, test_abs_avg=17.595645904541016
torch_compile_phases_forward grad[97] vs paper_forward: mean_abs=0.33060240745544434, max_abs=3.4375, mean_rel=0.2027246057987213, max_rel=2312.5, norm_rel=0.019472727552056313, ref_abs_avg=17.456478118896484, test_abs_avg=17.458297729492188
production_forward2 vs paper_forward output: mean_abs=0.0016450458206236362, max_abs=0.046875
production_forward2 grad[0] vs paper_forward: mean_abs=0.008444223552942276, max_abs=0.359375, mean_rel=0.07291826605796814, max_rel=108.71244812011719, norm_rel=0.019977720454335213, ref_abs_avg=0.45862895250320435, test_abs_avg=0.458646297454834
production_forward2 grad[1] vs paper_forward: mean_abs=7.227252006530762, max_abs=64.0, mean_rel=0.1811831146478653, max_rel=437.25360107421875, norm_rel=0.019990142434835434, ref_abs_avg=319.58984375, test_abs_avg=319.6854248046875
production_forward2 grad[2] vs paper_forward: mean_abs=1.2596702575683594, max_abs=5.0, mean_rel=0.10924942046403885, max_rel=10.911587715148926, norm_rel=0.02240041270852089, ref_abs_avg=57.5374755859375, test_abs_avg=57.495216369628906
production_forward2 grad[3] vs paper_forward: mean_abs=1.6259660720825195, max_abs=10.0, mean_rel=0.19086840748786926, max_rel=2048.8388671875, norm_rel=0.024516236037015915, ref_abs_avg=66.72473907470703, test_abs_avg=66.73091125488281
production_forward2 grad[4] vs paper_forward: mean_abs=1.4956934452056885, max_abs=10.0, mean_rel=0.4647892713546753, max_rel=5187.49951171875, norm_rel=0.02289823815226555, ref_abs_avg=65.64997863769531, test_abs_avg=65.64208984375
production_forward2 grad[5] vs paper_forward: mean_abs=1.1276391744613647, max_abs=4.375, mean_rel=0.1355341225862503, max_rel=15.180414199829102, norm_rel=0.02217925153672695, ref_abs_avg=51.40839385986328, test_abs_avg=51.434730529785156
production_forward2 grad[6] vs paper_forward: mean_abs=1.423633337020874, max_abs=10.0, mean_rel=0.16240772604942322, max_rel=2217.306884765625, norm_rel=0.024230744689702988, ref_abs_avg=59.105438232421875, test_abs_avg=59.11161804199219
production_forward2 grad[7] vs paper_forward: mean_abs=1.3101398944854736, max_abs=7.75, mean_rel=0.3714177906513214, max_rel=4500.0, norm_rel=0.02262510173022747, ref_abs_avg=58.19281005859375, test_abs_avg=58.196571350097656
production_forward2 grad[8] vs paper_forward: mean_abs=1.0055952072143555, max_abs=4.0, mean_rel=0.1481872946023941, max_rel=26.975772857666016, norm_rel=0.022552169859409332, ref_abs_avg=45.517913818359375, test_abs_avg=45.54708480834961
production_forward2 grad[9] vs paper_forward: mean_abs=1.2968623638153076, max_abs=8.0, mean_rel=0.16645529866218567, max_rel=1637.5213623046875, norm_rel=0.024080118164420128, ref_abs_avg=54.241798400878906, test_abs_avg=54.24333190917969
production_forward2 grad[10] vs paper_forward: mean_abs=1.1908769607543945, max_abs=7.0, mean_rel=0.3844758868217468, max_rel=3624.999755859375, norm_rel=0.02244643121957779, ref_abs_avg=53.372596740722656, test_abs_avg=53.36600112915039
production_forward2 grad[11] vs paper_forward: mean_abs=0.894012451171875, max_abs=3.75, mean_rel=0.10231302678585052, max_rel=7.727711200714111, norm_rel=0.02253379113972187, ref_abs_avg=39.64055252075195, test_abs_avg=39.665897369384766
production_forward2 grad[12] vs paper_forward: mean_abs=1.1893693208694458, max_abs=8.0, mean_rel=0.14940118789672852, max_rel=1579.0809326171875, norm_rel=0.023906022310256958, ref_abs_avg=50.06644821166992, test_abs_avg=50.07368469238281
production_forward2 grad[13] vs paper_forward: mean_abs=1.0898501873016357, max_abs=6.25, mean_rel=0.2754914164543152, max_rel=3374.999755859375, norm_rel=0.022363299503922462, ref_abs_avg=49.023536682128906, test_abs_avg=49.02763366699219
production_forward2 grad[14] vs paper_forward: mean_abs=0.8395442962646484, max_abs=3.375, mean_rel=0.10179779678583145, max_rel=8.721981048583984, norm_rel=0.02189335972070694, ref_abs_avg=40.03599548339844, test_abs_avg=40.02447509765625
production_forward2 grad[15] vs paper_forward: mean_abs=1.107560396194458, max_abs=7.4375, mean_rel=0.1745210886001587, max_rel=2642.03515625, norm_rel=0.02385355904698372, ref_abs_avg=46.669776916503906, test_abs_avg=46.67433166503906
production_forward2 grad[16] vs paper_forward: mean_abs=1.0232365131378174, max_abs=6.0, mean_rel=0.28527897596359253, max_rel=2999.999755859375, norm_rel=0.022283319383859634, ref_abs_avg=46.14007568359375, test_abs_avg=46.14365768432617
production_forward2 grad[17] vs paper_forward: mean_abs=0.7955845594406128, max_abs=3.0, mean_rel=0.07675979286432266, max_rel=2.3553762435913086, norm_rel=0.021417830139398575, ref_abs_avg=36.9268684387207, test_abs_avg=36.88798522949219
production_forward2 grad[18] vs paper_forward: mean_abs=1.047884464263916, max_abs=7.5, mean_rel=0.1500939130783081, max_rel=1413.284912109375, norm_rel=0.023753877729177475, ref_abs_avg=44.39404296875, test_abs_avg=44.393898010253906
production_forward2 grad[19] vs paper_forward: mean_abs=0.9591789245605469, max_abs=6.0, mean_rel=0.2459820806980133, max_rel=2656.249755859375, norm_rel=0.022092368453741074, ref_abs_avg=43.630287170410156, test_abs_avg=43.629825592041016
production_forward2 grad[20] vs paper_forward: mean_abs=0.7607011795043945, max_abs=3.125, mean_rel=0.08494707942008972, max_rel=3.55193829536438, norm_rel=0.021660180762410164, ref_abs_avg=36.0401496887207, test_abs_avg=35.99089813232422
production_forward2 grad[21] vs paper_forward: mean_abs=0.991024374961853, max_abs=8.0, mean_rel=0.15867391228675842, max_rel=1105.81884765625, norm_rel=0.02361636608839035, ref_abs_avg=42.223487854003906, test_abs_avg=42.22532653808594
production_forward2 grad[22] vs paper_forward: mean_abs=0.9080257415771484, max_abs=5.125, mean_rel=0.3100520372390747, max_rel=2656.249755859375, norm_rel=0.02185431309044361, ref_abs_avg=41.80931854248047, test_abs_avg=41.81483459472656
production_forward2 grad[23] vs paper_forward: mean_abs=0.7391000986099243, max_abs=3.0, mean_rel=0.08922338485717773, max_rel=10.747628211975098, norm_rel=0.0229354128241539, ref_abs_avg=32.21147918701172, test_abs_avg=32.22510528564453
production_forward2 grad[24] vs paper_forward: mean_abs=0.9428016543388367, max_abs=7.0, mean_rel=0.15350931882858276, max_rel=984.1834106445312, norm_rel=0.023569198325276375, ref_abs_avg=40.265785217285156, test_abs_avg=40.266658782958984
production_forward2 grad[25] vs paper_forward: mean_abs=0.8636548519134521, max_abs=5.5, mean_rel=0.2587338089942932, max_rel=2874.999755859375, norm_rel=0.021627215668559074, ref_abs_avg=40.1478271484375, test_abs_avg=40.153507232666016
production_forward2 grad[26] vs paper_forward: mean_abs=0.8611125946044922, max_abs=3.25, mean_rel=0.17637789249420166, max_rel=44.40983200073242, norm_rel=0.02486223168671131, ref_abs_avg=34.599708557128906, test_abs_avg=34.676963806152344
production_forward2 grad[27] vs paper_forward: mean_abs=1.1069390773773193, max_abs=7.3515625, mean_rel=0.1787773072719574, max_rel=1597.5428466796875, norm_rel=0.025431375950574875, ref_abs_avg=43.76228713989258, test_abs_avg=43.76323699951172
production_forward2 grad[28] vs paper_forward: mean_abs=1.0221340656280518, max_abs=6.0, mean_rel=0.37479931116104126, max_rel=3499.999755859375, norm_rel=0.023845532909035683, ref_abs_avg=43.04057312011719, test_abs_avg=43.05241775512695
production_forward2 grad[29] vs paper_forward: mean_abs=0.8396158218383789, max_abs=3.75, mean_rel=0.11097751557826996, max_rel=6.206326007843018, norm_rel=0.025917481631040573, ref_abs_avg=31.86901092529297, test_abs_avg=31.98552703857422
production_forward2 grad[30] vs paper_forward: mean_abs=1.0189509391784668, max_abs=7.0, mean_rel=0.16873076558113098, max_rel=1642.9068603515625, norm_rel=0.025670763105154037, ref_abs_avg=39.86585998535156, test_abs_avg=39.867340087890625
production_forward2 grad[31] vs paper_forward: mean_abs=0.9449675679206848, max_abs=6.0, mean_rel=0.3009238839149475, max_rel=3124.999755859375, norm_rel=0.024264030158519745, ref_abs_avg=39.160133361816406, test_abs_avg=39.161231994628906
production_forward2 grad[32] vs paper_forward: mean_abs=0.7429039478302002, max_abs=3.0, mean_rel=0.19312244653701782, max_rel=40.179405212402344, norm_rel=0.025258654728531837, ref_abs_avg=30.129499435424805, test_abs_avg=30.148435592651367
production_forward2 grad[33] vs paper_forward: mean_abs=0.9358169436454773, max_abs=6.5, mean_rel=0.1653681993484497, max_rel=1411.955078125, norm_rel=0.02549894154071808, ref_abs_avg=36.85504150390625, test_abs_avg=36.855621337890625
production_forward2 grad[34] vs paper_forward: mean_abs=0.8686946630477905, max_abs=5.5, mean_rel=0.3022899627685547, max_rel=2796.874755859375, norm_rel=0.023978305980563164, ref_abs_avg=36.38905715942383, test_abs_avg=36.3893928527832
production_forward2 grad[35] vs paper_forward: mean_abs=0.7178869247436523, max_abs=2.75, mean_rel=0.13978509604930878, max_rel=15.566601753234863, norm_rel=0.024263840168714523, ref_abs_avg=29.148338317871094, test_abs_avg=29.23172950744629
production_forward2 grad[36] vs paper_forward: mean_abs=0.8721818327903748, max_abs=6.0, mean_rel=0.16424931585788727, max_rel=1781.2498779296875, norm_rel=0.025211911648511887, ref_abs_avg=34.73957443237305, test_abs_avg=34.73963928222656
production_forward2 grad[37] vs paper_forward: mean_abs=0.8176021575927734, max_abs=5.0, mean_rel=0.3415539860725403, max_rel=2765.624755859375, norm_rel=0.02386891283094883, ref_abs_avg=34.361305236816406, test_abs_avg=34.36894989013672
production_forward2 grad[38] vs paper_forward: mean_abs=0.6597189903259277, max_abs=3.0, mean_rel=0.08497805893421173, max_rel=5.065256118774414, norm_rel=0.024234065786004066, ref_abs_avg=27.240062713623047, test_abs_avg=27.20521354675293
production_forward2 grad[39] vs paper_forward: mean_abs=0.820478081703186, max_abs=6.0, mean_rel=0.17517031729221344, max_rel=2112.134521484375, norm_rel=0.024945829063653946, ref_abs_avg=33.037872314453125, test_abs_avg=33.041748046875
production_forward2 grad[40] vs paper_forward: mean_abs=0.7688724994659424, max_abs=4.75, mean_rel=0.287550687789917, max_rel=2093.75, norm_rel=0.02357896789908409, ref_abs_avg=32.68537521362305, test_abs_avg=32.69076919555664
production_forward2 grad[41] vs paper_forward: mean_abs=0.5996084213256836, max_abs=2.6875, mean_rel=0.09897983074188232, max_rel=10.975088119506836, norm_rel=0.02301330491900444, ref_abs_avg=25.85531234741211, test_abs_avg=25.90089225769043
production_forward2 grad[42] vs paper_forward: mean_abs=0.7864789962768555, max_abs=5.25, mean_rel=0.15426817536354065, max_rel=1042.763427734375, norm_rel=0.024744749069213867, ref_abs_avg=31.90638542175293, test_abs_avg=31.90713882446289
production_forward2 grad[43] vs paper_forward: mean_abs=0.7268680334091187, max_abs=4.25, mean_rel=0.22773152589797974, max_rel=2328.125, norm_rel=0.022996140643954277, ref_abs_avg=31.695571899414062, test_abs_avg=31.697603225708008
production_forward2 grad[44] vs paper_forward: mean_abs=0.5784702301025391, max_abs=2.15625, mean_rel=0.06968170404434204, max_rel=7.02699089050293, norm_rel=0.02246478945016861, ref_abs_avg=26.600418090820312, test_abs_avg=26.625524520874023
production_forward2 grad[45] vs paper_forward: mean_abs=0.7498276233673096, max_abs=5.0, mean_rel=0.1629452407360077, max_rel=739.204345703125, norm_rel=0.024426843971014023, ref_abs_avg=30.825637817382812, test_abs_avg=30.829425811767578
production_forward2 grad[46] vs paper_forward: mean_abs=0.6928145885467529, max_abs=4.625, mean_rel=0.25994303822517395, max_rel=2046.8748779296875, norm_rel=0.022868502885103226, ref_abs_avg=30.375349044799805, test_abs_avg=30.381025314331055
production_forward2 grad[47] vs paper_forward: mean_abs=0.5993366241455078, max_abs=2.5, mean_rel=0.1268533319234848, max_rel=13.248059272766113, norm_rel=0.024747209623456, ref_abs_avg=23.733657836914062, test_abs_avg=23.72007179260254
production_forward2 grad[48] vs paper_forward: mean_abs=0.7156991362571716, max_abs=5.5, mean_rel=0.15911340713500977, max_rel=1534.259033203125, norm_rel=0.024515923112630844, ref_abs_avg=29.31985855102539, test_abs_avg=29.320266723632812
production_forward2 grad[49] vs paper_forward: mean_abs=0.6645218133926392, max_abs=4.5, mean_rel=0.295804888010025, max_rel=2999.999755859375, norm_rel=0.022992664948105812, ref_abs_avg=28.959705352783203, test_abs_avg=28.96364974975586
production_forward2 grad[50] vs paper_forward: mean_abs=0.6299839019775391, max_abs=3.0, mean_rel=0.0766175240278244, max_rel=4.288547992706299, norm_rel=0.02280295453965664, ref_abs_avg=27.698610305786133, test_abs_avg=27.706409454345703
production_forward2 grad[51] vs paper_forward: mean_abs=0.8169194459915161, max_abs=6.0, mean_rel=0.17951367795467377, max_rel=2708.5703125, norm_rel=0.025728631764650345, ref_abs_avg=31.853595733642578, test_abs_avg=31.853858947753906
production_forward2 grad[52] vs paper_forward: mean_abs=0.7600614428520203, max_abs=5.25, mean_rel=0.257151335477829, max_rel=1937.4998779296875, norm_rel=0.024361014366149902, ref_abs_avg=31.298931121826172, test_abs_avg=31.299198150634766
production_forward2 grad[53] vs paper_forward: mean_abs=0.5825588703155518, max_abs=2.375, mean_rel=0.18777644634246826, max_rel=29.071075439453125, norm_rel=0.023476798087358475, ref_abs_avg=24.943328857421875, test_abs_avg=24.933591842651367
production_forward2 grad[54] vs paper_forward: mean_abs=0.7455097436904907, max_abs=5.25, mean_rel=0.16192248463630676, max_rel=1066.8984375, norm_rel=0.025485552847385406, ref_abs_avg=29.3315486907959, test_abs_avg=29.330503463745117
production_forward2 grad[55] vs paper_forward: mean_abs=0.6946426630020142, max_abs=4.5, mean_rel=0.2549383044242859, max_rel=2999.999755859375, norm_rel=0.02390996553003788, ref_abs_avg=29.098894119262695, test_abs_avg=29.100770950317383
production_forward2 grad[56] vs paper_forward: mean_abs=0.5296551585197449, max_abs=2.1875, mean_rel=0.6764824986457825, max_rel=295.7084655761719, norm_rel=0.022595779970288277, ref_abs_avg=23.756515502929688, test_abs_avg=23.782791137695312
production_forward2 grad[57] vs paper_forward: mean_abs=0.6954368352890015, max_abs=4.6171875, mean_rel=0.1506347954273224, max_rel=620.3104858398438, norm_rel=0.024914776906371117, ref_abs_avg=27.99838638305664, test_abs_avg=27.998584747314453
production_forward2 grad[58] vs paper_forward: mean_abs=0.6435440182685852, max_abs=4.0, mean_rel=0.22108350694179535, max_rel=2062.5, norm_rel=0.02323959767818451, ref_abs_avg=27.70272445678711, test_abs_avg=27.69784927368164
production_forward2 grad[59] vs paper_forward: mean_abs=0.5105729103088379, max_abs=2.0, mean_rel=0.09846585988998413, max_rel=7.207757472991943, norm_rel=0.02350657805800438, ref_abs_avg=22.490476608276367, test_abs_avg=22.57147216796875
production_forward2 grad[60] vs paper_forward: mean_abs=0.655887246131897, max_abs=6.0, mean_rel=0.15836197137832642, max_rel=873.6090087890625, norm_rel=0.024569246917963028, ref_abs_avg=26.725364685058594, test_abs_avg=26.724897384643555
production_forward2 grad[61] vs paper_forward: mean_abs=0.6035771369934082, max_abs=3.625, mean_rel=0.23585177958011627, max_rel=1624.9998779296875, norm_rel=0.023245856165885925, ref_abs_avg=26.035396575927734, test_abs_avg=26.0344295501709
production_forward2 grad[62] vs paper_forward: mean_abs=0.46455806493759155, max_abs=2.21484375, mean_rel=0.2135365903377533, max_rel=38.86530685424805, norm_rel=0.02269836701452732, ref_abs_avg=21.017480850219727, test_abs_avg=21.011913299560547
production_forward2 grad[63] vs paper_forward: mean_abs=0.6117106676101685, max_abs=4.75, mean_rel=0.15737250447273254, max_rel=1274.0281982421875, norm_rel=0.024192266166210175, ref_abs_avg=25.33633804321289, test_abs_avg=25.33696174621582
production_forward2 grad[64] vs paper_forward: mean_abs=0.571848452091217, max_abs=4.25, mean_rel=0.2677399516105652, max_rel=2062.5, norm_rel=0.022676125168800354, ref_abs_avg=25.24350929260254, test_abs_avg=25.247047424316406
production_forward2 grad[65] vs paper_forward: mean_abs=0.44769924879074097, max_abs=1.75, mean_rel=0.11184124648571014, max_rel=17.5877685546875, norm_rel=0.022599495947360992, ref_abs_avg=19.87627410888672, test_abs_avg=19.878864288330078
production_forward2 grad[66] vs paper_forward: mean_abs=0.575779914855957, max_abs=4.5, mean_rel=0.1509983390569687, max_rel=899.8472900390625, norm_rel=0.023787761107087135, ref_abs_avg=24.22720718383789, test_abs_avg=24.227108001708984
production_forward2 grad[67] vs paper_forward: mean_abs=0.5351546406745911, max_abs=3.5, mean_rel=0.20845156908035278, max_rel=1687.4998779296875, norm_rel=0.022254839539527893, ref_abs_avg=24.0753116607666, test_abs_avg=24.06705665588379
production_forward2 grad[68] vs paper_forward: mean_abs=0.44024014472961426, max_abs=1.6875, mean_rel=0.09916800260543823, max_rel=21.03569984436035, norm_rel=0.02273874543607235, ref_abs_avg=19.608596801757812, test_abs_avg=19.596824645996094
production_forward2 grad[69] vs paper_forward: mean_abs=0.545862078666687, max_abs=4.5, mean_rel=0.14423441886901855, max_rel=640.0526733398438, norm_rel=0.023300331085920334, ref_abs_avg=23.459110260009766, test_abs_avg=23.45794677734375
production_forward2 grad[70] vs paper_forward: mean_abs=0.5076391696929932, max_abs=3.75, mean_rel=0.2432079315185547, max_rel=1843.7498779296875, norm_rel=0.02173462137579918, ref_abs_avg=23.3150634765625, test_abs_avg=23.323680877685547
production_forward2 grad[71] vs paper_forward: mean_abs=0.40334010124206543, max_abs=1.625, mean_rel=0.07137727737426758, max_rel=5.575847625732422, norm_rel=0.021479258313775063, ref_abs_avg=19.372787475585938, test_abs_avg=19.391393661499023
production_forward2 grad[72] vs paper_forward: mean_abs=0.5224574208259583, max_abs=4.5, mean_rel=0.1474055051803589, max_rel=885.2582397460938, norm_rel=0.023011358454823494, ref_abs_avg=22.716182708740234, test_abs_avg=22.717512130737305
production_forward2 grad[73] vs paper_forward: mean_abs=0.4826923608779907, max_abs=3.28125, mean_rel=0.2218402624130249, max_rel=1296.8748779296875, norm_rel=0.02143637090921402, ref_abs_avg=22.518728256225586, test_abs_avg=22.522605895996094
production_forward2 grad[74] vs paper_forward: mean_abs=0.44571375846862793, max_abs=2.03125, mean_rel=0.1300632208585739, max_rel=23.277584075927734, norm_rel=0.02220657281577587, ref_abs_avg=20.40837287902832, test_abs_avg=20.425106048583984
production_forward2 grad[75] vs paper_forward: mean_abs=0.5677164793014526, max_abs=5.0, mean_rel=0.15734249353408813, max_rel=906.7312622070312, norm_rel=0.02436807006597519, ref_abs_avg=23.374399185180664, test_abs_avg=23.37647819519043
production_forward2 grad[76] vs paper_forward: mean_abs=0.525880753993988, max_abs=4.0, mean_rel=0.2838559150695801, max_rel=1882.8123779296875, norm_rel=0.022691944614052773, ref_abs_avg=23.21087074279785, test_abs_avg=23.214237213134766
production_forward2 grad[77] vs paper_forward: mean_abs=0.407109797000885, max_abs=1.96875, mean_rel=0.19996309280395508, max_rel=39.39223098754883, norm_rel=0.021604076027870178, ref_abs_avg=18.938186645507812, test_abs_avg=18.97374153137207
production_forward2 grad[78] vs paper_forward: mean_abs=0.5337204933166504, max_abs=5.25, mean_rel=0.14944647252559662, max_rel=1199.5548095703125, norm_rel=0.023769931867718697, ref_abs_avg=22.45486831665039, test_abs_avg=22.456188201904297
production_forward2 grad[79] vs paper_forward: mean_abs=0.4883383810520172, max_abs=3.5, mean_rel=0.2020980715751648, max_rel=2062.5, norm_rel=0.022205697372555733, ref_abs_avg=22.030048370361328, test_abs_avg=22.045757293701172
production_forward2 grad[80] vs paper_forward: mean_abs=0.3712242841720581, max_abs=1.46875, mean_rel=0.19196327030658722, max_rel=66.90557098388672, norm_rel=0.02072126232087612, ref_abs_avg=18.35239028930664, test_abs_avg=18.337129592895508
production_forward2 grad[81] vs paper_forward: mean_abs=0.49408891797065735, max_abs=4.5, mean_rel=0.15026932954788208, max_rel=768.6318969726562, norm_rel=0.023625437170267105, ref_abs_avg=20.979209899902344, test_abs_avg=20.980819702148438
production_forward2 grad[82] vs paper_forward: mean_abs=0.45477020740509033, max_abs=3.375, mean_rel=0.21095502376556396, max_rel=1374.9998779296875, norm_rel=0.02246205136179924, ref_abs_avg=20.310951232910156, test_abs_avg=20.31307601928711
production_forward2 grad[83] vs paper_forward: mean_abs=0.3718729019165039, max_abs=2.09375, mean_rel=0.0741443783044815, max_rel=4.621400356292725, norm_rel=0.022798284888267517, ref_abs_avg=16.32668685913086, test_abs_avg=16.31395721435547
production_forward2 grad[84] vs paper_forward: mean_abs=0.4614013433456421, max_abs=4.75, mean_rel=0.13822472095489502, max_rel=981.90234375, norm_rel=0.022734731435775757, ref_abs_avg=20.36647605895996, test_abs_avg=20.367225646972656
production_forward2 grad[85] vs paper_forward: mean_abs=0.41527193784713745, max_abs=3.5, mean_rel=0.20878344774246216, max_rel=1453.1248779296875, norm_rel=0.02091342583298683, ref_abs_avg=19.8358154296875, test_abs_avg=19.83414077758789
production_forward2 grad[86] vs paper_forward: mean_abs=0.3162655830383301, max_abs=1.1875, mean_rel=0.0676196962594986, max_rel=2.6755058765411377, norm_rel=0.01977410539984703, ref_abs_avg=15.952723503112793, test_abs_avg=15.948617935180664
production_forward2 grad[87] vs paper_forward: mean_abs=0.43499845266342163, max_abs=4.75, mean_rel=0.14265286922454834, max_rel=847.2681884765625, norm_rel=0.022417154163122177, ref_abs_avg=19.52667999267578, test_abs_avg=19.52743148803711
production_forward2 grad[88] vs paper_forward: mean_abs=0.3967609703540802, max_abs=3.5, mean_rel=0.18239206075668335, max_rel=1499.9998779296875, norm_rel=0.020325221121311188, ref_abs_avg=19.54051971435547, test_abs_avg=19.54108428955078
production_forward2 grad[89] vs paper_forward: mean_abs=0.32555070519447327, max_abs=1.552734375, mean_rel=0.3367338180541992, max_rel=135.13067626953125, norm_rel=0.02128916047513485, ref_abs_avg=15.598333358764648, test_abs_avg=15.592058181762695
production_forward2 grad[90] vs paper_forward: mean_abs=0.40474313497543335, max_abs=4.65625, mean_rel=0.13605904579162598, max_rel=605.9825439453125, norm_rel=0.021925203502178192, ref_abs_avg=18.595012664794922, test_abs_avg=18.595481872558594
production_forward2 grad[91] vs paper_forward: mean_abs=0.36872124671936035, max_abs=4.0, mean_rel=0.16477052867412567, max_rel=968.7499389648438, norm_rel=0.02046082727611065, ref_abs_avg=18.26932716369629, test_abs_avg=18.26389503479004
production_forward2 grad[92] vs paper_forward: mean_abs=0.31535983085632324, max_abs=1.25, mean_rel=0.09613369405269623, max_rel=3.8203701972961426, norm_rel=0.019977547228336334, ref_abs_avg=15.705026626586914, test_abs_avg=15.71048641204834
production_forward2 grad[93] vs paper_forward: mean_abs=0.39075565338134766, max_abs=4.75, mean_rel=0.12338507175445557, max_rel=469.1575012207031, norm_rel=0.02160273678600788, ref_abs_avg=18.29189682006836, test_abs_avg=18.292158126831055
production_forward2 grad[94] vs paper_forward: mean_abs=0.35021501779556274, max_abs=3.5, mean_rel=0.17179185152053833, max_rel=999.9999389648438, norm_rel=0.01961233653128147, ref_abs_avg=18.114578247070312, test_abs_avg=18.119138717651367
production_forward2 grad[95] vs paper_forward: mean_abs=0.28344061970710754, max_abs=1.25, mean_rel=0.21843037009239197, max_rel=69.2675552368164, norm_rel=0.02032732032239437, ref_abs_avg=14.120006561279297, test_abs_avg=14.121171951293945
production_forward2 grad[96] vs paper_forward: mean_abs=0.3628358244895935, max_abs=4.0, mean_rel=0.1305147409439087, max_rel=1157.9580078125, norm_rel=0.020963219925761223, ref_abs_avg=17.59404754638672, test_abs_avg=17.595088958740234
production_forward2 grad[97] vs paper_forward: mean_abs=0.3287043273448944, max_abs=3.0, mean_rel=0.19217319786548615, max_rel=2125.0, norm_rel=0.019281111657619476, ref_abs_avg=17.456478118896484, test_abs_avg=17.456960678100586

