Attention-based PCA

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work establishes a rigorous theoretical connection between attention mechanisms and principal component analysis (PCA) in unsupervised learning. By leveraging Gaussian and spiked Wishart data models together with tools from random matrix theory and spectral analysis, the study provides the first explicit characterization of how attention aligns with PCA. It demonstrates that both softmax and linear attention layers inherently recover the leading eigendirections of the data covariance matrix—thereby capturing the principal components and even underlying latent signal structures—under both infinite and finite prompt regimes. These findings offer a solid theoretical foundation for the representational power of attention mechanisms in learning meaningful data embeddings without supervision.

📝 Abstract

We study attention mechanisms through the lens of a canonical unsupervised problem: principal component analysis (PCA). We show that, when trained on Gaussian data, both softmax and linear attention layers learn parameters that align with the principal eigenvectors of the covariance matrix, thereby establishing a direct and explicit connection with PCA. Our analysis covers both finite and infinite prompt regimes. In the infinite-prompt limit, we prove convergence to globally optimal solutions aligned with the leading spectral direction, while in the finiteprompt setting we show that the same behavior emerges up to sampling effects. We further extend the analysis to an in-context setting with spiked Wishart covariances, where attention successfully recovers the underlying signal direction. These results demonstrate that attention inherently performs PCA-like computations under unsupervised objectives, providing a theoretical foundation for its representation-learning capabilities.

Problem

Research questions and friction points this paper is trying to address.

attention mechanism

principal component analysis

unsupervised learning

covariance matrix

representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

attention mechanism

principal component analysis

unsupervised learning