Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Transformer models lack geometric interpretability, hindering understanding of their internal structural behavior. Method: This work reinterprets Transformers through the lens of Probabilistic Laplacian Eigenmaps (PLE), treating the attention matrix as a graph adjacency matrix. It replaces standard attention computation with an implicit graph Laplacian term and introduces a graph diffusion operation involving subtraction of the identity matrix. Integrating the ProbDR framework with unfolded inference, the approach unifies explanations of both the initial linear dimensionality reduction and intra-block spectral graph structure. Contribution/Results: The study reveals that Transformers fundamentally implement probabilistic spectral dimensionality reduction and diffusion on graphs. Experiments demonstrate that the proposed graph-diffusion-enhanced attention mechanism significantly improves validation performance on both language modeling and lightweight vision Transformer benchmarks.

Technology Category

Application Category

📝 Abstract

We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform "linear" dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

Problem

Research questions and friction points this paper is trying to address.

Interpret transformers as probabilistic Laplacian Eigenmaps

Show transformers perform linear dimensionality reduction initially

Improve performance via graph diffusion step in attention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic Laplacian Eigenmaps model interpretation

Linear dimensionality reduction at initialization

Graph diffusion step improves validation performance

🔎 Similar Papers

No similar papers found.