Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the problem of efficiently and properly agnostically learning Boolean functions that are intersections of any $K$ halfspaces under the Gaussian distribution. Overcoming the prior reliance on exponential-time brute-force search, the authors present the first polynomial-time algorithm that achieves efficient proper learning for $K \geq 2$. Their approach integrates structured hypothesis search with careful complexity control, yielding a nearly optimal algorithm in the statistical query model. The runtime is $d^{O(K^2 \log(1/\varepsilon)/\varepsilon^2)} + (K/\varepsilon)^{O(K^3/\varepsilon^{2.5})}$. Notably, for the case $K=1$, the algorithm significantly improves the best-known complexity from $d^{O(1/\varepsilon^4)}$ to $d^{O(1/\varepsilon^2)}$.

📝 Abstract

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

Problem

Research questions and friction points this paper is trying to address.

proper agnostic learning

halfspaces

Gaussian marginals

computational efficiency

Boolean functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

proper agnostic learning

halfspaces

Gaussian marginals