Noise Stability of Transformer Models

πŸ“… 2026-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing average sensitivity metrics struggle to characterize the robustness of large language models to input perturbations and fail to adequately capture their β€œjunta-like” input dependency. This work introduces noise stability into Transformer analysis as a novel measure of model simplicity and robustness. Through theoretical analysis of noise stability in single-layer attention and ReLU MLP blocks, and by modeling multi-layer signal propagation via covariance interval propagation, we devise a new regularization strategy for training. The proposed method accelerates training by approximately 35% on algorithmic tasks and 75% on next-token prediction tasks, while consistently promoting the emergence of grokking. Our findings establish a theoretical link between noise stability, model interpretability, and training dynamics.

Technology Category

Application Category

πŸ“ Abstract
Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the"junta-like"input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35\%$ and $75\%$ respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.
Problem

Research questions and friction points this paper is trying to address.

noise stability
average sensitivity
simplicity bias
transformer models
input dependence
Innovation

Methods, ideas, or system contributions that make the work stand out.

noise stability
simplicity bias
Transformer interpretability
covariance interval propagation
grokking acceleration
πŸ”Ž Similar Papers
No similar papers found.
T
Themistoklis Haris
Department of Computer Science, Boston University
Z
Zihan Zhang
National Institute of Informatics
Yuichi Yoshida
Yuichi Yoshida
National Institute of Informatics
Theoretical Computer Science