π€ AI Summary
This paper investigates the universal learning rates of empirical risk minimization (ERM) for binary classification under agnostic learning. Specifically, it characterizes the asymptotic decay rate of excess risk with respect to sample size $n$ for any fixed data distribution. The work establishes, for the first time, a complete trichotomy of achievable rates for ERM in the agnostic setting: exponential ($e^{-n}$), sub-square-root ($o(n^{-1/2})$), and arbitrarily slow ratesβalong with necessary and sufficient conditions on concept classes for each class. Methodologically, it innovatively integrates an extended VC theory, probabilistic inequalities, structural analysis of concept classes, and information-theoretic lower bound constructions. This unified framework jointly captures both target-function-dependent and Bayes-dependent rates. The results prove that every concept class falls into exactly one of these three mutually exclusive rate categories, thereby resolving the long-standing problem of universal rate classification for agnostic ERM.
π Abstract
The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical Risk Minimization (ERM) principle being fundamental in the PAC theory and ubiquitous in practical machine learning, the recent work of arXiv:2412.02810 studied the universal rates of ERM for binary classification under the realizable setting. However, the assumption of realizability is too restrictive to hold in practice. Indeed, the majority of the literature on universal learning has focused on the realizable case, leaving the non-realizable case barely explored. In this paper, we consider the problem of universal learning by ERM for binary classification under the agnostic setting, where the ''learning curve"reflects the decay of the excess risk as the sample size increases. We explore the possibilities of agnostic universal rates and reveal a compact trichotomy: there are three possible agnostic universal rates of ERM, being either $e^{-n}$, $o(n^{-1/2})$, or arbitrarily slow. We provide a complete characterization of which concept classes fall into each of these categories. Moreover, we also establish complete characterizations for the target-dependent universal rates as well as the Bayes-dependent universal rates.