🤖 AI Summary
This work addresses the problem of establishing tight generalization bounds for agnostic PAC learning in the small-error regime, where the optimal hypothesis error τ satisfies τ ≈ d/m (with d denoting VC dimension and m the sample size). Resolving a long-standing open question posed at FOCS’24—namely, whether the asymptotic error constant c can be reduced from 2.1 to 1—we propose a computationally efficient learner: a finely weighted aggregation scheme built upon ERM classifiers, integrated with refined VC-dimension analysis and sharp concentration inequalities. Our method achieves an excess risk bound of c·τ + O(√(τ(d + log(1/δ))/m) + (d + log(1/δ))/m), where c ≤ 2.1. This bound matches the known lower bound, thereby establishing the asymptotically optimal error rate for agnostic learning in the small-error regime and fully characterizing its statistical complexity.
📝 Abstract
Binary classification in the classic PAC model exhibits a curious phenomenon: Empirical Risk Minimization (ERM) learners are suboptimal in the realizable case yet optimal in the agnostic case. Roughly speaking, this owes itself to the fact that non-realizable distributions $mathcal{D}$ are simply more difficult to learn than realizable distributions -- even when one discounts a learner's error by $mathrm{err}(h^*_{mathcal{D}})$, the error of the best hypothesis in $mathcal{H}$ for $mathcal{D}$. Thus, optimal agnostic learners are permitted to incur excess error on (easier-to-learn) distributions $mathcal{D}$ for which $ au = mathrm{err}(h^*_{mathcal{D}})$ is small. Recent work of Hanneke, Larsen, and Zhivotovskiy (FOCS `24) addresses this shortcoming by including $ au$ itself as a parameter in the agnostic error term. In this more fine-grained model, they demonstrate tightness of the error lower bound $ au + Omega left(sqrt{frac{ au (d + log(1 / delta))}{m}} + frac{d + log(1 / delta)}{m}
ight)$ in a regime where $ au>d/m$, and leave open the question of whether there may be a higher lower bound when $ au approx d/m$, with $d$ denoting $mathrm{VC}(mathcal{H})$. In this work, we resolve this question by exhibiting a learner which achieves error $c cdot au + O left(sqrt{frac{ au (d + log(1 / delta))}{m}} + frac{d + log(1 / delta)}{m}
ight)$ for a constant $c leq 2.1$, thus matching the lower bound when $ au approx d/m$. Further, our learner is computationally efficient and is based upon careful aggregations of ERM classifiers, making progress on two other questions of Hanneke, Larsen, and Zhivotovskiy (FOCS `24). We leave open the interesting question of whether our approach can be refined to lower the constant from 2.1 to 1, which would completely settle the complexity of agnostic learning.