The Laplacian Mechanism Improves Transformers by Reshaping Token Geometry

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitations of conventional Transformers in effectively controlling the variance of token representations and lacking explicit guidance toward an ideal embedding geometry. The authors propose Laplacian Attention, a novel mechanism that, for the first time, integrates token variance regulation with the Neural Collapse phenomenon. This approach encourages embeddings of tokens from the same class to collapse toward their class mean while simultaneously arranging the class means into a maximally separable geometric configuration. Through the synergistic use of Laplacian Attention, principal component analysis, and Neural Collapse metrics, the method consistently achieves performance gains across multiple vision and language benchmarks, thereby validating the efficacy and superiority of the induced representational geometry.

Technology Category

Application Category

📝 Abstract

Transformers leverage attention, the residual connection, and layer normalization to control the variance of token representations. We propose to modify attention into a Laplacian mechanism that gives the model more direct control over token variance. We conjecture that this helps transformers achieve the ideal token geometry. To investigate our conjecture, we first show that incorporating the Laplacian mechanism into transformers induces consistent improvements across benchmarks in computer vision and language. Next, we study how the Laplacian mechanism impacts the geometry of token representations using various tools: 1) principal component analysis, 2) cosine similarity metric, 3) analysis of variance, and 4) Neural Collapse metrics. Our investigation shows that the Laplacian mechanism reshapes token embeddings toward a geometry of maximal separability: tokens collapse according to their classes, and the class means exhibit Neural Collapse.

Problem

Research questions and friction points this paper is trying to address.

token geometry

variance control

Neural Collapse

representation learning

Transformer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Laplacian Mechanism

Token Geometry

Neural Collapse