Localmax dynamics for attention in transformers and its asymptotic behavior

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dynamic evolution of attention mechanisms in Transformers by proposing a novel discrete-time attention model—*localmax dynamics*—which lies between softmax and hardmax. It achieves controllable relaxation of hardmax via a time-varying alignment sensitivity parameter that modulates neighborhood influence. Methodologically, we first define the *quiescent set* to characterize system invariance, then integrate dynamical systems theory, convex geometry, and Lyapunov function analysis to establish asymptotic convergence guarantees under decaying, non-vanishing, and fully time-varying parameters. Theoretically, we prove that the convex hull of attention weights converges to a specific polytope, yet this limit cannot be fully characterized by the maximum-alignment set; moreover, finite-time convergence is unattainable. This framework advances the understanding of asymmetric, time-varying attention dynamics and naturally recovers hardmax as a limiting case.

Technology Category

Application Category

📝 Abstract
We introduce a new discrete-time attention model, termed the localmax dynamics, which interpolates between the classic softmax dynamics and the hardmax dynamics, where only the tokens that maximize the influence toward a given token have a positive weight. As in hardmax, uniform weights are determined by a parameter controlling neighbor influence, but the key extension lies in relaxing neighborhood interactions through an alignment-sensitivity parameter, which allows controlled deviations from pure hardmax behavior. As we prove, while the convex hull of the token states still converges to a convex polytope, its structure can no longer be fully described by a maximal alignment set, prompting the introduction of quiescent sets to capture the invariant behavior of tokens near vertices. We show that these sets play a key role in understanding the asymptotic behavior of the system, even under time-varying alignment sensitivity parameters. We further show that localmax dynamics does not exhibit finite-time convergence and provide results for vanishing, nonzero, time-varying alignment-sensitivity parameters, recovering the limiting behavior of hardmax as a by-product. Finally, we adapt Lyapunov-based methods from classical opinion dynamics, highlighting their limitations in the asymmetric setting of localmax interactions and outlining directions for future research.
Problem

Research questions and friction points this paper is trying to address.

Modeling attention dynamics between softmax and hardmax in transformers
Analyzing asymptotic convergence behavior with varying alignment sensitivity
Introducing quiescent sets to capture invariant token behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Localmax dynamics interpolates softmax and hardmax attention
Introduces alignment-sensitivity parameter for controlled deviation
Uses quiescent sets to capture asymptotic token behavior
🔎 Similar Papers
No similar papers found.