🤖 AI Summary
This work addresses the failure of traditional margin-based generalization theories in zero-margin nonlinear classification problems such as Gaussian XOR by proposing a dynamic neuron-block evolution framework. It reveals that during training, neurons cluster into a four-way structure and co-evolve in a coordinated manner. By integrating Gaussian XOR modeling, block-level clustering analysis, and average-case generalization theory, the authors develop a two-stage dynamic evolution model that theoretically predicts the trajectory of neuron-block evolution. Numerical experiments under both Gaussian and non-Gaussian settings confirm the robustness of this framework. The study breaks away from the prevailing generalization analysis paradigm that relies on positive-margin assumptions and effectively distinguishes between regions of reliable prediction and persistent error.
📝 Abstract
The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on classification tasks with a positive margin, where worst-case gradient bounds suffice. In contrast, we study zero-margin nonlinear classification by analyzing the Gaussian XOR problem, where inputs are Gaussian and the XOR decision boundary determines labels. In this setting, a non-negligible fraction of data lies arbitrarily close to the boundary, breaking standard margin-based arguments. Building on Glasgow's (2024) analysis, we extend the study of training dynamics from discrete to Gaussian inputs and develop a framework for the dynamics of neuron blocks. We show that neurons cluster into four directions and that block-level signals evolve coherently, a phenomenon essential in the Gaussian setting where individual neuron signals vary significantly. Leveraging this block perspective, we analyze generalization without relying on margin assumptions, adopting an average-case view that distinguishes regions of reliable prediction from regions of persistent error. Numerical experiments confirm the predicted two-phase block dynamics and demonstrate their robustness beyond the Gaussian setting.