HPC-B2 Benchmark Summary (B2 vs B1, CUDA only)
======================================================================

GPU: NVIDIA GeForce RTX 4060 Laptop GPU
Warmup runs: 1, Timed runs: 5 (best-of)
B1 baseline: loaded from C:\Users\lenovo\dcb_exp\results\round39_b1_benchmark.csv

   K         n       b1_ms       b2_ms    speedup        b1_dps        b2_dps
---------------------------------------------------------------------------
   1     50000      70.753      16.157      4.379          14.1          61.9
   4     50000     107.248      26.716     4.0144          37.3         149.7
   8     50000     187.621      45.909     4.0868          42.6         174.3
  16     50000     300.646      91.251     3.2947          53.2         175.3
  32     50000      579.61     589.091     0.9839          55.2          54.3
  64     50000    1167.859    1209.524     0.9656          54.8          52.9
   1    100000      78.873       72.79     1.0836          12.7          13.7
   4    100000     125.825     123.145     1.0218          31.8          32.5
   8    100000     189.824     187.963     1.0099          42.1          42.6
  16    100000      312.52     337.412     0.9262          51.2          47.4
  32    100000     590.102     582.102     1.0137          54.2          55.0
  64    100000    1062.572    1142.757     0.9298          60.2          56.0


Peak B2 vs B1 speedup analysis
======================================================================
  n=  50000: best speedup=4.379x at K=1
  n= 100000: best speedup=1.084x at K=1


B2 regressions (b2 > b1 by more than 5%):
  K=16 n=100000: b2=337.412ms b1=312.52ms speedup=0.926x  *** REGRESSION ***
  K=64 n=100000: b2=1142.757ms b1=1062.572ms speedup=0.930x  *** REGRESSION ***
