Finding Clustering Algorithms in the Transformer Architecture

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work investigates whether Transformers can exactly implement classical algorithms, focusing on the Lloyd algorithm for k-means clustering. We propose a fully Transformer-based architecture composed solely of standard components—self-attention, residual connections, layer normalization, and MLPs—and formally construct a neural network that is stepwise equivalent to Lloyd’s iterative procedure, with rigorous mathematical proof. To our knowledge, this is the first fully interpretable, purely neural implementation of k-means. By structured architectural modifications, we derive three novel, interpretable variants: soft clustering, spherical clustering, and truncated clustering—each preserving algorithmic semantics while extending modeling capabilities. Theoretical analysis and empirical evaluation jointly confirm that the architecture precisely replicates Lloyd iterations within a finite number of steps, and each variant admits clear geometric and statistical interpretations.

Technology Category

Application Category

📝 Abstract

The invention of the transformer architecture has revolutionized Artificial Intelligence (AI), yielding unprecedented success in areas such as natural language processing, computer vision, and multimodal reasoning. Despite these advances, it is unclear whether transformers are able to learn and implement precise algorithms. Here, we demonstrate that transformers can exactly implement a fundamental and widely used algorithm for $k$-means clustering: Lloyd's algorithm. First, we theoretically prove the existence of such a transformer architecture, which we term the $k$-means transformer, that exactly implements Lloyd's algorithm for $k$-means clustering using the standard ingredients of modern transformers: attention and residual connections. Next, we numerically implement this transformer and demonstrate in experiments the exact correspondence between our architecture and Lloyd's algorithm, providing a fully neural implementation of $k$-means clustering. Finally, we demonstrate that interpretable alterations (e.g., incorporating layer normalizations or multilayer perceptrons) to this architecture yields diverse and novel variants of clustering algorithms, such as soft $k$-means, spherical $k$-means, trimmed $k$-means, and more. Collectively, our findings demonstrate how transformer mechanisms can precisely map onto algorithmic procedures, offering a clear and interpretable perspective on implementing precise algorithms in transformers.

Problem

Research questions and friction points this paper is trying to address.

Can transformers learn precise algorithms like clustering

Implementing Lloyd's k-means algorithm using transformer architecture

Exploring novel clustering variants via interpretable transformer modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer exactly implements Lloyd's algorithm

Uses standard attention and residual connections

Interpretable alterations yield novel clustering variants

🔎 Similar Papers

A mathematical perspective on Transformers