Propagation of Chaos in Contextual Flow Maps

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the lack of theoretical tools to quantify the deviation of Transformer models from an idealized infinite-context system in large-context settings. The authors abstract Transformers as Context Flow Mapping (CFM) dynamical systems and, for the first time, introduce propagation chaos theory—combined with McKean–Vlasov structures and Wasserstein metrics—to establish uniform deviation bounds between finite- and infinite-context CFMs across both network depth and training iterations. By characterizing loss gradients through an Eulerian adjoint formulation and analyzing the stability of the forward–adjoint system, they derive optimal convergence rates of $n^{-1/d}$ for general CFMs and achieve the parametric rate $n^{-1/2}$ for a restricted class encompassing Transformers. This provides the first rigorous theoretical guarantee for large-context modeling.
📝 Abstract
We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length $n$ becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent. Both bounds achieve the optimal Wasserstein rate $n^{-1/d}$ for general CFMs and parametric rate $n^{-1/2}$ for a restricted class of CFMs that includes transformers as a special case. The analysis rests on a new Eulerian adjoint formulation of the loss gradient and stability estimates for the resulting forward--adjoint system, both of which may be of independent interest.
Problem

Research questions and friction points this paper is trying to address.

Propagation of Chaos
Contextual Flow Maps
Transformers
Large-context regime
McKean--Vlasov dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Flow Maps
Propagation of Chaos
McKean–Vlasov dynamics
Transformer theory
Eulerian adjoint formulation
🔎 Similar Papers
No similar papers found.