🤖 AI Summary
This study investigates the solvability phase transition (SAT/UNSAT) of infinitely wide two-layer neural networks in random pattern–label association memory tasks. We consider the negative-margin perceptron with non-overlapping receptive fields and generic activation functions. Using the full-step replica symmetry breaking (full-RSB) method—applied rigorously for the first time in this setting—we derive the exact critical capacity threshold. Our analysis reveals that the overlap distribution of typical states exhibits a discontinuous support (an “overlap gap”), which invalidates convergence guarantees of approximate message passing (AMP). Moreover, gradient descent dynamics provably fail to achieve the theoretical capacity: they converge to atypical low-overlap solutions, a fundamental limitation induced by the overlap gap and resulting optimization bias. The work establishes a unified framework linking architectural constraints, statistical-physics phase transitions, and algorithmic performance limits.
📝 Abstract
We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite-width two-layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron we show that the overlap distribution of typical states displays an overlap gap (a disconnected support) in certain regions of the phase diagram defined by the value of the margin and the density of patterns to be stored. This implies that some recent theorems that ensure convergence of Approximate Message Passing (AMP) based algorithms to capacity are not applicable. Finally, we show that Gradient Descent is not able to reach the maximal capacity, irrespectively of the presence of an overlap gap for typical states. This finding, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.