🤖 AI Summary
This work investigates the computational power of reproducible learning under arbitrary data distributions, with a focus on overcoming the well-known limitations of the Statistical Query (SQ) model in learning parity functions. By integrating techniques from differential privacy, reproducible PAC learning, and subspace covering from linear algebra, the paper presents the first efficient algorithm for reproducibly learning parities over any distribution. It establishes, for the first time, that efficient reproducible learning strictly surpasses the capabilities of SQ learning in the general distributional setting, thereby bridging the previously existing gap between reproducible learning and differentially private learning in terms of computational expressiveness. The core technical contribution is an efficient reproducible subspace covering algorithm, which introduces a novel computational paradigm for tasks beyond the reach of SQ learnability.
📝 Abstract
We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.