Reinforcement learning from verifiable rewards (RLVR) has become the dominant training signal for frontier reasoning models, but existing verified environments are dominated by symbolic or code-centred tasks. Scientific inverse problems — CT, MRI, compressed sensing, phase retrieval — remain unmeasured despite their continuous, ill-posed, uncertainty-sensitive structure. We release ten RL environments spanning five scientific modalities with two design properties absent from current benchmarks: (i) every reward is split-conformal calibrated to a target 1−α coverage, so honest posterior width is rewarded alongside point-estimate quality; and (ii) every measurement is procedurally regenerated per query, making fixed-string contamination mathematically impossible at ~10^22 effective instances per env. On 50 paired (env, model) comparisons across six frontier models, classical baselines significantly outperform every tested LLM on 32 at p<0.05 (uncorrected and Bonferroni-corrected), pooled mean Δ=+0.199 (10k paired bootstrap). Top LLMs (Haiku 4.5, Opus 4.7, Sonnet 4.6) reach 0.53–0.56 cross-env mean, below classical 0.630. An earlier oracle-delegation artefact (r=0.858) in the tool-use env was removed; primitive-only reruns across all six models cluster at 0.40–0.55, against classical OMP at 0.87. Empirical conformal coverage across all ten envs lands at 0.9013±0.0166 against the 0.90 target (N=200). Environments are MIT-licensed on the Prime Intellect Hub.
