Performance by phenotype

Mean per-screen AnDCG@100 across the five coarse phenotype classes used in the paper. Pick a model set to reproduce Figure 3 or to compare any subset of the methods in the cache.

From §5.4: Predictive performance was highest for the viability screens, likely because their hit genes are enriched for conserved cellular dependencies that recur across screens. Other phenotypes, such as host-pathogen response or molecular reporter activity, appear more context-specific and therefore harder to predict from generic biological knowledge alone.

Cell values are the mean AnDCG@100 across screens with the listed phenotype. The n in each row header is the number of unique screens that contribute to the row. Hover a cell for the underlying number.

Why phenotype matters

AssayBench combines screens spanning fitness/proliferation/viability, drug/chemical response, host–pathogen response, molecular reporter activity, and trafficking/localization. Viability screens are the most predictable because the same essential genes recur across cell lines and contexts. That is why the phenotype-based hit-frequency baseline does so well overall, despite being a simple prior. The same predictability does not transfer to phenotypes where the hit set is more context-specific.