
============================================================
ACCURACY RESULTS
============================================================

Source-wise Results:
--------------------------------------------------
Source                    Score      Count     
--------------------------------------------------
NEON_benchmark            0.576      0         
Radogoshi et al. 2021     0.469      0         
Kwon et al. 2023          0.426      0         
SelvaBox                  0.320      0         
Weecology_University_Florida 0.313      0         
Velasquez-Camacho et al. 2023 0.227      0         
Zamboni et al. 2021       0.210      0         
Dumortier et al. 2025     0.154      0         
Reiersen et al. 2022      0.114      0         
Sun et al. 2022           0.097      0         
OAM-TCD                   0.059      0         
World Resources Institute 0.057      0         
Santos et al. 2019        0.001      0         

Summary Statistics:
----------------------------------------
Average accuracy: 0.211
Worst-group accuracy: 0.001
Min accuracy: 0.001
Max accuracy: 0.576
Std accuracy: 0.171

============================================================
RECALL RESULTS
============================================================

Source-wise Results:
--------------------------------------------------
Source                    Score      Count     
--------------------------------------------------
NEON_benchmark            0.779      0         
Radogoshi et al. 2021     0.594      0         
Kwon et al. 2023          0.506      0         
Zamboni et al. 2021       0.484      0         
Velasquez-Camacho et al. 2023 0.483      0         
Weecology_University_Florida 0.444      0         
SelvaBox                  0.426      0         
Dumortier et al. 2025     0.228      0         
Reiersen et al. 2022      0.183      0         
World Resources Institute 0.108      0         
Sun et al. 2022           0.104      0         
OAM-TCD                   0.103      0         
Santos et al. 2019        0.016      0         

Summary Statistics:
----------------------------------------
Average recall: 0.311
Worst-group recall: 0.016
Min recall: 0.016
Max recall: 0.779
Std recall: 0.224
Average detection_acc across source: nan
Average detection_accuracy: 0.211
  source_id = 0  [n =     94]:	detection_accuracy = 0.154
  source_id = 1  [n =      5]:	detection_accuracy = 0.426
  source_id = 2  [n =     33]:	detection_accuracy = 0.576
  source_id = 3  [n =    784]:	detection_accuracy = 0.059
  source_id = 4  [n =    286]:	detection_accuracy = 0.469
  source_id = 5  [n =     12]:	detection_accuracy = 0.114
  source_id = 6  [n =     95]:	detection_accuracy = 0.001
  source_id = 7  [n =    253]:	detection_accuracy = 0.320
  source_id = 8  [n =     27]:	detection_accuracy = 0.097
  source_id = 9  [n =    176]:	detection_accuracy = 0.227
  source_id = 10  [n =    461]:	detection_accuracy = 0.313
  source_id = 11  [n =     94]:	detection_accuracy = 0.057
  source_id = 12  [n =     42]:	detection_accuracy = 0.210
Worst-group detection_accuracy: 0.001
Average detection_recall: 0.311
  source_id = 0  [n =     94]:	detection_recall = 0.228
  source_id = 1  [n =      5]:	detection_recall = 0.506
  source_id = 2  [n =     33]:	detection_recall = 0.779
  source_id = 3  [n =    784]:	detection_recall = 0.103
  source_id = 4  [n =    286]:	detection_recall = 0.594
  source_id = 5  [n =     12]:	detection_recall = 0.183
  source_id = 6  [n =     95]:	detection_recall = 0.016
  source_id = 7  [n =    253]:	detection_recall = 0.426
  source_id = 8  [n =     27]:	detection_recall = 0.104
  source_id = 9  [n =    176]:	detection_recall = 0.483
  source_id = 10  [n =    461]:	detection_recall = 0.444
  source_id = 11  [n =     94]:	detection_recall = 0.108
  source_id = 12  [n =     42]:	detection_recall = 0.484
Worst-group detection_recall: 0.016
