Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work investigates the continuous-time limiting behavior and synchronization mechanisms of token evolution in Transformers with finite depth and width. By establishing pathwise convergence, the token dynamics within MLP blocks are mapped to a continuous-time stochastic interacting particle system, yielding a stochastic partial differential equation that governs the evolution of the token distribution. The study provides the first rigorous proof that inter-layer dynamics exhibit propagation of chaos in the large-token-number regime, uncovering noise-induced synchronization and identifying conditions for exponential energy dissipation. Under a noise coercivity assumption on the activation function, an exchangeable scaling limit framework is constructed, and it is shown that in the strong common-noise regime, the system’s mean interaction energy decays exponentially.
📝 Abstract
We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.
Problem

Research questions and friction points this paper is trying to address.

stochastic scaling limits
synchronization by noise
transformer models
interacting particle systems
propagation of chaos
Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic scaling limits
synchronization by noise
propagation of chaos
interacting particle systems
transformer dynamics
🔎 Similar Papers
No similar papers found.