Hardness of Learning Fixed Parities with Neural Networks

📅 2025-01-01

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work reveals the inherent difficulty of learning fixed parity functions—such as the full-coordinate XOR—using single-hidden-layer ReLU networks trained via perturbed gradient descent. For this class of balanced Boolean functions, we provide the first rigorous proof that standard gradient-based optimization provably fails to converge on *any* minimal-size ReLU network realizing a fixed parity function. Methodologically, we integrate Fourier analysis with computational learning theory to derive a novel decay bound on the Fourier coefficients of linear threshold functions; this bound bridges statistical query lower bounds with empirical training failure. Our results demonstrate that the observed learning failure is not due to algorithmic shortcomings, but stems from a fundamental mismatch between the ReLU network architecture and gradient-based optimization—highlighting an intrinsic tension between representational capacity and trainability. This provides a new theoretical lens for understanding the dual expressivity–optimization bottlenecks in deep learning.

Technology Category

Application Category

📝 Abstract

Learning parity functions is a canonical problem in learning theory, which although computationally tractable, is not amenable to standard learning algorithms such as gradient-based methods. This hardness is usually explained via statistical query lower bounds [Kearns, 1998]. However, these bounds only imply that for any given algorithm, there is some worst-case parity function that will be hard to learn. Thus, they do not explain why fixed parities - say, the full parity function over all coordinates - are difficult to learn in practice, at least with standard predictors and gradient-based methods [Abbe and Boix-Adsera, 2022]. In this paper, we address this open problem, by showing that for any fixed parity of some minimal size, using it as a target function to train one-hidden-layer ReLU networks with perturbed gradient descent will fail to produce anything meaningful. To establish this, we prove a new result about the decay of the Fourier coefficients of linear threshold (or weighted majority) functions, which may be of independent interest.

Problem

Research questions and friction points this paper is trying to address.

Neural Networks

Learning Even Functions

Gradient Methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLU Networks

Perturbed Gradient Descent

Fourier Coefficient Decay

🔎 Similar Papers

Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks