mamf-finder.py: Matrix Multiply Performance Finder
dtype: fp8_e4m3 | device: NVIDIA H100 80GB HBM3 | GPU 0

  M     N     K   TFLOPS
1024  1024  1024   625.0
2048  2048  2048  1134.5
4096  4096  4096  1578.3
8192  8192  8192  1782.6

Best: M=8192, N=8192, K=8192, 1782.6 TFLOPS
