Attention-based clustering

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work investigates the intrinsic cluster-structure discovery capability of Transformer self-attention in unsupervised clustering, focusing on automatic cluster identification under data generated by Gaussian Mixture Models (GMMs). We propose a simplified two-head attention architecture and establish, for the first time, a theoretical guarantee: minimizing the population risk over unlabeled data alone suffices for the attention head parameters to converge to the true GMM mixture centers. Through rigorous analysis of attention weight dynamics, we prove that these weights spontaneously align with the underlying cluster centers—thereby revealing the fundamental principle that “attention is clustering.” Our analysis operates at the population level and provides the first provable theoretical foundation for unsupervised learning in Transformers, formally bridging self-attention mechanisms and clustering theory.

Technology Category

Application Category

📝 Abstract

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids.

Problem

Research questions and friction points this paper is trying to address.

Analyzing Transformers' unsupervised structure extraction capability

Demonstrating clustering suitability for Gaussian mixture data

Aligning attention heads with true centroids via risk minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based clustering with Transformers

Unsupervised structure extraction from data

Two-head attention aligns with mixture centroids

🔎 Similar Papers

Style Based Clustering of Visual Artworks