======================================================================
Round 38  DCB vs R Benchmark Summary
======================================================================
diffcb version : 0.1.12
PyTorch        : 2.5.1+cu121
CUDA available : Yes  NVIDIA GeForce RTX 4060 Laptop GPU
R available    : Yes (C:\Program Files\R\R-4.6.0\bin\Rscript.exe)

 Wall-clock timing (bimodal dist) 
           n  device      DCB_ms        R_ms     speedup

       1,000     cpu     689.210     841.514        1.22
       1,000    cuda     140.399     841.514        5.99
       5,000     cpu    3267.562     835.870        0.26
       5,000    cuda     607.322     835.870        1.38
      10,000     cpu    6567.783    1194.846        0.18
      10,000    cuda    1163.539    1194.846        1.03
      50,000     cpu     580.771     866.960        1.49
      50,000    cuda     134.734     866.960        6.43
     100,000     cpu     462.512     925.104        2.00
     100,000    cuda     133.989     925.104        6.90
     500,000     cpu    1267.579    1399.683        1.10
     500,000    cuda     136.586    1399.683       10.25
   1,000,000     cpu     993.638    2057.586        2.07
   1,000,000    cuda     137.072    2057.586       15.01

 Same-sample accuracy (mean overest_pct by n, dist) 
           n        dist    mean_overest%       min       max
       1,000     bimodal           -0.000    -0.000    -0.000
       1,000    gaussian           -0.002    -0.003    -0.002
      10,000     bimodal           -0.000    -0.000    -0.000
      10,000    gaussian           -0.004    -0.008    -0.000
     100,000     bimodal           -0.000    -0.001    +0.001
     100,000    gaussian           +0.027    -0.075    +0.132

 Independent-sample accuracy (mean |diff|% by n, dist) 
           n        dist     mean_|diff|%       min       max
       1,000     bimodal            1.814     0.456     3.572
       1,000    gaussian           33.084     0.383    84.843
      10,000     bimodal            0.465     0.025     1.390
      10,000    gaussian           76.316    22.712   353.308
     100,000     bimodal            0.176     0.020     0.413
     100,000    gaussian           42.854     0.865    99.952

 Large-n DCB standalone scaling 
              n  device        DCB_ms

      1,000,000     cpu        1126.9
      1,000,000    cuda         152.4
     10,000,000     cpu        1340.7
     10,000,000    cuda         149.2
    100,000,000     cpu        1416.8
    100,000,000    cuda         161.5
  1,000,000,000     cpu        1473.5
  1,000,000,000    cuda         131.9
