Per-screen explorer

AssayBench evaluates 1,920 screens; the paper highlights a handful. Use this view to pick any single screen and inspect how every model ranked its gene list, or pick a model and rank all screens by its per-screen score.

The data behind this view is the same per-example score table that powers Figure 3 of the paper: adjusted nDCG@100 and Precision@100 for every (model, screen) pair on the year split, including the LaTest novel cohort.

Loading…

Reading the chart

In by screen mode each row is a model and the marker is its score on the selected screen, sorted from best to worst. In by model mode each row is a coarse phenotype class; the box summarizes the AnDCG@100 distribution across all screens in that class, and each dot is one screen (hover for the screen name and phenotype). That makes it easy to spot where the model is strong vs weak and which phenotypes it struggles on.