๐ค AI Summary
This paper studies active learning of general halfspaces under Gaussian distributions, focusing on sample-efficiency bottlenecks arising from data skewness and agnostic noise. We propose an adaptive membership-query-based algorithmโthe first to achieve efficient learning in the agnostic setting. Crucially, we prove that label queries cannot surpass passive learning in polynomial-sized sample pools, establishing a fundamental separation between membership and label querying paradigms. Technically, our approach integrates geometric characterizations of Gaussian space, information-theoretic lower-bound analysis, bias-*p*-driven query selection, and a novel decomposition of agnostic error. Our algorithm achieves optimal query complexity $ ilde{O}(min{1/p, 1/epsilon} + d cdot mathrm{polylog}(1/epsilon))$ and classification error $O(mathrm{opt}) + epsilon$, strictly improving upon the proven lower bound for label queries.
๐ Abstract
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces under the Gaussian distribution on $R^d$ in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm is allowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivial improvements over the passive setting. Specifically, we show that any active learner requires label complexity of $ ilde{Omega}(d/(log(m)epsilon))$, where $m$ is the number of unlabeled examples. Specifically, to beat the passive label complexity of $ ilde{O} (d/epsilon)$, an active learner requires a pool of $2^{poly(d)}$ unlabeled samples. On the positive side, we show that this lower bound can be circumvented with membership query access, even in the agnostic model. Specifically, we give a computationally efficient learner with query complexity of $ ilde{O}(min{1/p, 1/epsilon} + dcdot polylog(1/epsilon))$ achieving error guarantee of $O(opt)+epsilon$. Here $p in [0, 1/2]$ is the bias and $opt$ is the 0-1 loss of the optimal halfspace. As a corollary, we obtain a strong separation between the active and membership query models. Taken together, our results characterize the complexity of learning general halfspaces under Gaussian marginals in these models.