🤖 AI Summary
This paper studies efficient learning of intersections of halfspaces under factorized distributions. Prior work established a $d^{mathcal{O}(log(1/gamma))}$ quasi-polynomial lower bound for this problem in the classical statistical query (CSQ) model, where $gamma$ denotes the margin parameter and $d$ the dimension. To overcome this bottleneck, we propose the first polynomial-time algorithm that bypasses this lower bound in the general statistical query (SQ) model. Our approach integrates an enhanced Jennrich’s algorithm, random-projection-based moment tensor PCA, and gradient descent optimization, complemented by a novel dual-theoretic characterization of the moment structure induced by the marginal distribution. We prove that the algorithm runs in $mathrm{poly}(d, 1/gamma)$ time. This result establishes the first strong separation between the CSQ and SQ models for this problem, significantly expanding the frontier of efficiently learnable classes—particularly in weak-learning regimes.
📝 Abstract
Learning intersections of halfspaces is a central problem in Computational Learning Theory. Even for just two halfspaces, it remains a major open question whether learning is possible in polynomial time with respect to the margin $gamma$ of the data points and their dimensionality $d$. The best-known algorithms run in quasi-polynomial time $d^{O(log(1/gamma))}$, and it has been shown that this complexity is unavoidable for any algorithm relying solely on correlational statistical queries (CSQ). In this work, we introduce a novel algorithm that provably circumvents the CSQ hardness barrier. Our approach applies to a broad class of distributions satisfying a natural, previously studied, factorizability assumption. Factorizable distributions lie between distribution-specific and distribution-free settings, and significantly extend previously known tractable cases. Under these distributions, we show that CSQ-based methods still require quasipolynomial time even for weakly learning, whereas our algorithm achieves $poly(d,1/gamma)$ time by leveraging more general statistical queries (SQ), establishing a strong separation between CSQ and SQ for this simple realizable PAC learning problem. Our result is grounded in a rigorous analysis utilizing a novel duality framework that characterizes the moment tensor structure induced by the marginal distributions. Building on these structural insights, we propose new, efficient learning algorithms. These algorithms combine a refined variant of Jennrich's Algorithm with PCA over random projections of the moment tensor, along with a gradient-descent-based non-convex optimization framework.