🤖 AI Summary
This work addresses the agnostic learning problem for multiclass linear classifiers under Gaussian marginal distributions and presents the first dimension-independent, polynomial-time robust algorithm, resolving a long-standing challenge in settings with three or more classes. By integrating pairwise improper learning with a localization-based analytical framework and leveraging structural properties of the Gaussian distribution, the method overcomes the fundamental barrier of super-polynomial sample complexity that plagues standard multiclass perceptron algorithms in this setting. Theoretical guarantees include an error bound of $\widetilde{O}(k^{3/2}\sqrt{\text{opt}}) + \varepsilon$ for general $k$-class problems, an $O(\text{opt}) + \varepsilon$ guarantee for $k = 3$, and a $\text{poly}(k) \cdot \text{opt} + \varepsilon$ approximation for geometrically regular $k$-class classifiers.
📝 Abstract
We study the task of agnostic learning of multiclass linear classifiers under the Gaussian distribution. Given labeled examples $(x, y)$ from a distribution over $\mathbb{R}^d \times [k]$, with Gaussian $x$-marginal, the goal is to output a hypothesis whose error is comparable to that of the best $k$-class linear classifier. While the binary case $k=2$ has a well-developed algorithmic theory, much less is known for $k \ge 3$. Even for $k=3$, prior robust algorithms incur exponential dependence on the inverse of the desired accuracy in both complexity and representation size. In this work, we develop new structural results for multiclass linear classifiers and use them to design fully polynomial-time robust learners with dimension-independent error guarantees. Our first result shows that the standard multiclass perceptron algorithm requires super-polynomially many samples and updates, even with clean labels and Gaussian marginals, revealing a basic obstruction absent in the binary case. Our main positive result is a pairwise improper-learning framework which yields an efficient learner with error $\widetilde O(k^{3/2}\sqrt{\mathrm{opt}})+ε$ for general $k$. Additionally, we develop a sharper localization-based framework which leads to error $O(\mathrm{opt})+ε$ for $k=3$, and error $\mathrm{poly}(k)\mathrm{opt}+ε$ for geometrically regular $k$-class linear classifiers.