Cross-validation is routinely used to estimate out-of-sample performance in statistical learning, but standard shuffled or blocked folds can be invalid when responses are measured over future intervals. A label such as the mean demand over the next twelve half-hours, the next-day rainfall amount, or the return over the next twenty bars overlaps the labels of nearby rows. If overlapping label intervals are split between training and test sets, the validation score partly measures information reuse rather than generalization. This article formalizes split-level conditions for leakage-aware validation in overlapping-label time-series and panel data, and presents purgedcv, a Python implementation that exposes purging, embargoing, walk-forward validation, group-purged folds, and combinatorial purged cross-validation through the scikit-learn splitter protocol, with diagnostic assertions for auditing train/test splits. A controlled experiment with an unpredictable target shows that shuffled k-fold can report a mean out-of-sample R2 of 0.918 while admitting complete train/test label overlap. A full-population benchmark on Low Carbon London smart-meter data shows a more nuanced case: the temporal leakage gap is small but measurable, whereas the larger issue is household-level generalization. The software, notebooks, tests, and benchmark scripts are open source and make the validation choice auditable rather than implicit.

This is a preprint and has not been peer reviewed.
