Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of efficiently and properly agnostically learning Boolean functions that are intersections of any $K$ halfspaces under the Gaussian distribution. Overcoming the prior reliance on exponential-time brute-force search, the authors present the first polynomial-time algorithm that achieves efficient proper learning for $K \geq 2$. Their approach integrates structured hypothesis search with careful complexity control, yielding a nearly optimal algorithm in the statistical query model. The runtime is $d^{O(K^2 \log(1/\varepsilon)/\varepsilon^2)} + (K/\varepsilon)^{O(K^3/\varepsilon^{2.5})}$. Notably, for the case $K=1$, the algorithm significantly improves the best-known complexity from $d^{O(1/\varepsilon^4)}$ to $d^{O(1/\varepsilon^2)}$.
📝 Abstract
We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.
Problem

Research questions and friction points this paper is trying to address.

proper agnostic learning
halfspaces
Gaussian marginals
computational efficiency
Boolean functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

proper agnostic learning
halfspaces
Gaussian marginals
computational efficiency
Boolean functions
🔎 Similar Papers
2024-07-01Annual Conference Computational Learning TheoryCitations: 1