Multistability of Self-Attention Dynamics in Transformers

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the continuous-time multi-agent dynamics of Transformer self-attention, focusing on its multistability properties. Methodologically, it establishes a rigorous theoretical connection to the multi-agent Oja flow and integrates nonlinear dynamical systems analysis, spectral theory, and cooperative control techniques. The key contribution is the first systematic classification and proof of coexistence of four asymptotically stable equilibrium classes: consensus, bipartite consensus, clustering, and polygonal equilibria. Crucially, the first three stable equilibria are generically aligned with the principal eigenvectors of the value matrix—revealing an intrinsic structural bias of self-attention dynamics toward dominant spectral directions. This work provides a novel dynamical-systems perspective on internal representation evolution in Transformers and establishes the first rigorous mathematical bridge between self-attention and classical learning flows.

Technology Category

Application Category

📝 Abstract
In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head''self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.
Problem

Research questions and friction points this paper is trying to address.

Analyzes multistability in self-attention dynamics of transformer models
Classifies four types of equilibria in single-head attention systems
Investigates coexistence of stable equilibria aligned with value matrix eigenvectors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-attention dynamics modeled as continuous-time multiagent system
Classified equilibria into consensus, bipartite, clustering types
Stable equilibria aligned with value matrix eigenvectors
🔎 Similar Papers
No similar papers found.