Universal NP-Hardness of Clustering under General Utilities

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the practical instability of clustering algorithms—often attributed to sensitivity to initialization and hyperparameters—by proposing a Unified Clustering Problem (UCP) framework. The UCP formalizes ten prominent clustering methods, including k-means, Gaussian Mixture Models, and DBSCAN, as optimization problems that maximize a computable partition utility over a finite metric space. Through polynomial-time reductions from graph coloring and Exact Cover by 3-Sets (X3C), the study proves that UCP is NP-hard. This constitutes the first unified theoretical foundation encompassing mainstream clustering paradigms, revealing their inherent limitations from a computational complexity perspective. The result provides a rigorous theoretical basis for understanding common failure modes in clustering and motivates the design of novel objective functions aimed at enhancing algorithmic stability.

Technology Category

Application Category

📝 Abstract
Clustering is a central primitive in unsupervised learning, yet practice is dominated by heuristics whose outputs can be unstable and highly sensitive to representations, hyperparameters, and initialisation. Existing theoretical results are largely objective-specific and do not explain these behaviours at a unifying level. We formalise the common optimisation core underlying diverse clustering paradigms by defining the Universal Clustering Problem (UCP): the maximisation of a polynomial-time computable partition utility over a finite metric space. We prove the NP-hardness of UCP via two independent polynomial-time reductions from graph colouring and from exact cover by 3-sets (X3C). By mapping ten major paradigms -- including k-means, GMMs, DBSCAN, spectral clustering, and affinity propagation -- to the UCP framework, we demonstrate that each inherits this fundamental intractability. Our results provide a unified explanation for characteristic failure modes, such as local optima in alternating methods and greedy merge-order traps in hierarchical clustering. Finally, we show that clustering limitations reflect interacting computational and epistemic constraints, motivating a shift toward stability-aware objectives and interaction-driven formulations with explicit guarantees.
Problem

Research questions and friction points this paper is trying to address.

clustering
NP-hardness
unstable outputs
theoretical unification
computational intractability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal Clustering Problem
NP-hardness
computational intractability
clustering paradigms
stability-aware objectives
🔎 Similar Papers
No similar papers found.