🤖 AI Summary
This work investigates the optimal sampling complexity required to achieve dimension-independent $(1\pm\varepsilon)$ relative error guarantees in regularized classification tasks. Focusing on Lipschitz continuous loss functions—such as logistic, hinge, and ReLU losses—combined with $\ell_1/k$, $\ell_2/k$, and $\ell_2^2/k$ regularization, the paper introduces a novel analytical framework based on higher-order moments and empirical processes, circumventing the overcounting issues inherent in traditional VC-dimension and sensitivity-based approaches. Matching upper and lower bounds are established for the first time for all three regularization schemes: tight bounds of $k^2/\varepsilon^2$ and $k/\varepsilon^2$ are achieved for $\ell_2/k$ and $\ell_1/k$, respectively, while for $\ell_2^2/k$ regularization under the condition $g'(0)=0$, a linear-in-$k$ sampling complexity is attained, substantially improving upon the prior $k^3/\varepsilon^2$ bound. The proposed algorithms employ uniform or (squared) norm-based sampling and demonstrate strong theoretical and practical performance.
📝 Abstract
We prove optimal sampling bounds achieving $(1\pm\varepsilon)$-relative error for a broad class of Lipschitz continuous classification loss functions under various regularization terms. This includes important functions such as logistic and sigmoid loss, hinge loss, and ReLU loss, as prominent and popular representative examples. In particular, we prove $k^2/\varepsilon^2$ upper and lower bounds for $\|\cdot\|_2/k$ regularization, and $k/\varepsilon^2$ upper and lower bounds for $\|\cdot\|_1/k$ regularization. For $\|\cdot\|_2^2/k$ regularization, the sampling complexity depends mainly on a bounded derivative property: if $|g'(x)|\leq g(x)$, and $g(0)>0$, and $g$ is monotonic or convex, then it admits linear in $k$ sampling complexity; otherwise the general bound is $k^2/\varepsilon^2$. However, if $g(0)=0$, our results indicate that no dimension-free bounds are possible, and even sublinear bounds are ruled out. All upper bounds are complemented by matching lower bounds up to polylogarithmic terms. Moreover, our work relies conceptually and algorithmically on simple uniform or (squared) norm sampling and hereby improves over recent cubic $k^3/\varepsilon^2$ sensitivity sampling bounds of (Alishahi and Phillips, ICML'24). This is achieved by refined arguments involving higher moment bounds and empirical process analyses to avoid overcounting that appears in the de-facto standard VC-dimension and sensitivity framework.