🤖 AI Summary
This work investigates the vulnerability of standard machine learning algorithms under non-independent and identically distributed (non-i.i.d.) data, introducing a monotonic adversarial contamination model in which an adversary, after observing clean i.i.d. samples, injects additional samples labeled according to the true target function. Although this setting appears benign or even beneficial, it reveals that state-of-the-art binary classification algorithms can suffer significant degradation in expected error under such contamination, exposing their overreliance on data exchangeability. By integrating adversarial contamination modeling, statistical learning theory, and uniform convergence analysis, the paper demonstrates that all known optimal algorithms may fail in this regime, whereas algorithms grounded in uniform convergence retain their generalization guarantees, thereby highlighting their robustness advantage.
📝 Abstract
We study the extent to which standard machine learning algorithms rely on exchangeability and independence of data by introducing a monotone adversarial corruption model. In this model, an adversary, upon looking at a"clean"i.i.d. dataset, inserts additional"corrupted"points of their choice into the dataset. These added points are constrained to be monotone corruptions, in that they get labeled according to the ground-truth target function. Perhaps surprisingly, we demonstrate that in this setting, all known optimal learning algorithms for binary classification can be made to achieve suboptimal expected error on a new independent test point drawn from the same distribution as the clean dataset. On the other hand, we show that uniform convergence-based algorithms do not degrade in their guarantees. Our results showcase how optimal learning algorithms break down in the face of seemingly helpful monotone corruptions, exposing their overreliance on exchangeability.