Gene-set bias
For each (model, screen), we score how strongly the model's top-100 prediction over- or under-represents curated gene families relative to ground truth. Positive cells mean the model leans toward that family; negative cells mean it under-predicts it.
From §5.6: GPT models over-represent cell-cycle genes, Gemini models over-represent developmental-biology genes, and all models over-represent disease-associated genes, likely reflecting the prevalence of those genes in the training corpus.
Hover a cell for the exact value. Bias is computed as (fraction of predicted top-100 genes in the set) minus (fraction of true top-100 genes in the set). Models are grouped by total parameter count (Small < 10B, Medium < 50B, Large < 250B, Very Large < 1T, Frontier).