SMSD Pro External Benchmark Suite
SMSD version : 6.8.1
Date         : 2026-04-01 13:06 UTC
Timeout      : 10000 ms per pair

Five benchmarks against community-standard public datasets:
  1. Stress pairs        — 12 adversarial hard cases
  2. Tautobase           — 468 tautomer pairs (Wahl & Sander 2020)
  3. Dalke random        — 1,000 low-similarity MCS pairs
  4. Dalke NN            — 1,000 nearest-neighbor MCS pairs
  5. Ehrlich-Rarey       — 1,400 SMARTS patterns (Ehrlich & Rarey 2011)

================================================================================
BENCHMARK 1: Stress Pairs — 12 Adversarial Hard Cases
Source: stress_pairs.tsv (BioInception curated)
Cases: cubane/cuneane, coronene, vancomycin, macrocycles, cyclopeptides
================================================================================
  cubane-cuneane                      MCS=  8         0.7ms    Symmetric cages, heavy orbit pruning
  anthracene-phenanthrene             MCS= 14         0.3ms    Fused aromatic isomers, cyclic degeneracy
  crown-ether-15c5-18c6               MCS= 15         0.7ms    Macrocyclic symmetry, featureless chains
  cyclohexadecane-cyclooctadecane     MCS= 16         3.4ms    Large mono-elemental rings
  naphthalene-anthracene              MCS= 10         0.3ms    Fused PAH size mismatch
  cubane-self                         MCS=  8         0.4ms    Perfect symmetric self-match
  coronene-self                       MCS= 24         1.6ms    Large PAH self-match (24 atoms)
  neopentane-self                     MCS=  5         0.2ms    Highly branched symmetric
  adamantane-self                     MCS= 11         0.1ms    Cage self-match (10 heavy atoms)
  icosane-hexadecane                  MCS= 16         0.2ms    Long chain polymer-like
  kekulene-self                       MCS= 48         0.4ms    Massive symmetric PAH
  cyclopeptide-mutant                 MCS= 50       592.1ms    Dense cyclic peptide, >50 atoms

  Result: 12 pairs, 0 timeouts

================================================================================
BENCHMARK 2: Tautobase — Tautomer-Aware MCS
Source: chodera_tautobase_subset.txt (468 pairs)
Reference: Wahl & Sander, J. Chem. Inf. Model. 2020, 60(3):1085-1089
Metric: tautomer-aware MCS vs strict MCS; % full heavy-atom match
================================================================================
  Pairs tested:       468
  Full match:         437 (93.4%)
  Partial gain:       0
  No gain:            31
  Total atoms gained: +2
  Median time:        186 us

================================================================================
BENCHMARK 3: Dalke Random Pairs — Low-Similarity MCS
Source: dalke_random_pairs.tsv (1,000 pairs from ChEMBL drug-like set)
Reference: Dalke, J. Cheminform. 2013 (FMCS benchmark protocol)
Metric: median MCS time, LFUB certificate rate (fast-exit fraction)
================================================================================
  Pairs tested:  1000
  Timeouts:      0
  Median time:   462516 us
  Mean MCS size: 14.8 atoms

================================================================================
BENCHMARK 4: Dalke Nearest-Neighbor Pairs — High-Similarity MCS
Source: dalke_nn_pairs.tsv (1,000 pairs, Tanimoto >= 0.7)
Reference: Dalke, J. Cheminform. 2013 (FMCS benchmark protocol)
Metric: median MCS time, MCS >= 5 atoms rate
================================================================================
  Pairs tested:      1000
  Timeouts:          0
  MCS >= 5 atoms:    992 (99.2%)
  Median time:       583 us
  Mean MCS size:     25.4 atoms

================================================================================
BENCHMARK 5: Ehrlich-Rarey SMARTS v2.0 — Substructure Search
Source: ehrlich_rarey_smarts.txt (1,400 patterns, ZBH Hamburg)
Reference: Ehrlich & Rarey, J. Chem. Inf. Model. 2011, 51(6):1316-1324
Target: ibuprofen (CC(C)Cc1ccc(CC(C)C(O)=O)cc1)
Metric: median query time (us), hit rate
================================================================================
  Patterns tested: 1395
  Matched:         94 (6.7%)
  Unsupported:     5
  Median time:     10 us
