🤖 AI Summary
This work addresses the challenge that learning high-dimensional parity functions typically requires exponential sample complexity under general settings, rendering them intractable for gradient-based methods. The authors propose a novel approach combining product-type neural networks, sparse Bernoulli inputs with $p_e \leq 1/N$, and carefully tuned hyperparameters, achieving polynomial sample complexity via gradient descent for the first time in dimensions as high as $N = 10^5$. Theoretical analysis establishes a crucial connection between the network’s inductive bias and input sparsity, providing convergence guarantees. Empirical validation confirms the method’s efficacy, identifies optimal choices for $p_e$ and learning rate $\alpha$, and reveals clear polynomial scaling behavior.
📝 Abstract
Parity functions are fundamental Boolean operations with critical applications across machine learning, cryptography, and error correction. Yet, learning high-dimensional parity functions poses significant challenges: in a general setting, standard neural network architectures typically require exponential sample complexity, making gradient-based optimization intractable for large number of inputs $N$. We demonstrate that compact product-based neural architectures combined with stochastic data sparsity (Bernoulli inputs with $p_e \leq 1/N$) and appropriate hyperparameter choice enable efficient parity learning, with theoretical guarantees of convergence. Experiments validate our theory across dimensions up to $N = 100{,}000$, with empirical evidence showing optimal hyperparameter choices for $p_e$ and learning rate $α$, as well as polynomial complexity scaling laws. This work establishes fundamental connections between architectural inductive bias and data sparsity, opening new possibilities for neural arithmetic, structured reasoning, binary neural networks, and machine learning applied to automated protocol discovery.