🤖 AI Summary
This work investigates token selection behavior and generalization properties of Transformer attention mechanisms under label noise. Addressing classification tasks with noisy labels, we propose a signal-to-noise ratio (SNR)-based theoretical framework that, for the first time, formally characterizes the phenomenon of “benign overfitting” in low-SNR regimes: models fit noisy labels yet retain high generalization accuracy, with generalization emerging abruptly in late training stages—termed *delayed generalization*. We theoretically prove that attention-based token selection enables this benign overfitting. Empirical validation across synthetic and real-world datasets consistently reproduces the predicted delayed generalization curves, confirming our theoretical analysis. Our findings challenge conventional notions of overfitting and offer a novel perspective on the robustness and generalization mechanisms of deep models, particularly Transformers, under label corruption.
📝 Abstract
Attention mechanism is a fundamental component of the transformer model and plays a significant role in its success. However, the theoretical understanding of how attention learns to select tokens is still an emerging area of research. In this work, we study the training dynamics and generalization ability of the attention mechanism under classification problems with label noise. We show that, with the characterization of signal-to-noise ratio (SNR), the token selection of attention mechanism achieves benign overfitting, i.e., maintaining high generalization performance despite fitting label noise. Our work also demonstrates an interesting delayed acquisition of generalization after an initial phase of overfitting. Finally, we provide experiments to support our theoretical analysis using both synthetic and real-world datasets.