The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the failure of Lloyd’s k-means algorithm in high-dimensional, high-noise settings, where nearly all initial partitions become fixed points, preventing recovery of clearly separable cluster structures. Through probabilistic analysis and theoretical proof, the study reveals a fundamental divergence in the high-dimensional behavior of Lloyd’s and Hartigan’s k-means algorithms: while Lloyd’s method degenerates into merely returning its initialization, Hartigan’s algorithm remains capable of converging to the correct clustering. This finding not only explains the frequent empirical failure of standard k-means in high dimensions but also establishes, for the first time, the theoretical superiority of Hartigan’s approach in terms of robustness under such challenging conditions.

Technology Category

Application Category

📝 Abstract
Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.
Problem

Research questions and friction points this paper is trying to address.

k-means
high-dimensional
catastrophic failure
clustering
noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

k-means
high-dimensional clustering
catastrophic failure
Hartigan's algorithm
Lloyd's algorithm
🔎 Similar Papers
Roy R. Lederman
Roy R. Lederman
Yale University
Applied MathematicsData ScienceStatisticsCryo-EMNumerical Analysis
D
David Silva-Sánchez
Department of Applied and Computational Mathematics, Yale University, New Haven, CT
Z
Ziling Chen
Department of Statistics and Data Science, Yale University, New Haven, CT
Gilles Mordant
Gilles Mordant
Gibbs Assistant Professor, Yale University
StatisticsProbabilityOptimal TransportStatistics in Biophysics
A
Amnon Balanov
School of Electrical and Computer Engineering, Tel Aviv University, Tel Aviv, Israel
Tamir Bendory
Tamir Bendory
Associate Professor
mathematical signal processingdata sciencecryo-electron microscopy