🤖 AI Summary
This work addresses a critical limitation of existing mean-field theories, which neglect the influence of training on attention-induced token clustering dynamics in Transformers. By analyzing a simplified model—comprising only linear feedforward layers trained with L² regularization within a noisy mean-field framework—the study systematically investigates how training reshapes token clustering behavior during inference. Integrating entropy-regularized interaction energy analysis with mean-field theory, the authors uncover a novel phase transition: training drives the token distribution to escape from an initial attention-dominated clustered state. They further elucidate the underlying dynamical mechanism of this transition, thereby establishing the first mean-field theory that unifies both training and inference dynamics. This framework provides a foundational theoretical understanding of representational evolution in Transformer models.
📝 Abstract
Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions toward clustering. However, existing mean-field analyses largely treat model parameters as prescribed, leaving open how training reshapes this clustering picture. We study this question in a noisy mean-field Transformer in which only a parameter-linear FFN is trained under $L^2$ regularization. We find and analyze a training-induced phase in the dynamics: after initially following attention-driven clustering, the token distribution can leave the clustered regime near the final layers. Our mathematical analysis is based on an entropy-regularized interaction energy that captures the clustering bias of attention. More broadly, our results point toward a training-aware mean-field theory of Transformer dynamics, in which training and inference dynamics are treated together.