CLUBench: A Clustering Benchmark

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study addresses the lack of systematic, large-scale empirical comparisons among traditional, deep learning–based, and foundation model–based clustering methods, which has hindered informed algorithm selection. The authors construct a comprehensive benchmark encompassing 24 algorithms evaluated across 131 datasets through 178,815 experiments, offering the first unified assessment of these three methodological families on tabular, textual, and image data. Their analysis reveals that deep clustering methods do not consistently outperform traditional approaches; instead, combining pretrained embeddings with classical algorithms such as K-means and spectral clustering proves highly effective across modalities. The work further demonstrates that clustering remains challenging even in the era of foundation models and introduces an efficient approximate evaluation strategy leveraging the low-rank structure of performance matrices to enable rapid model selection.

📝 Abstract

Clustering is a fundamental problem in data science with a long-standing research history, yielding numerous insightful algorithms. Despite this progress, a systematic and large-scale empirical evaluation that jointly considers conventional algorithms, deep learning-based methods, and recent foundation model-based clustering remains largely absent, leading to limited guidance on algorithm selection and deployment. To address this gap, we introduce CLUBench, a comprehensive clustering benchmark comprising 24 algorithms of diverse principles evaluated on 131 datasets across tabular, text, and image data, involving 178,815 experiments. Importantly, our analyses of (i) the impact of hyperparameter tuning,(ii) the impact of data types and characteristics,(iii) the impact of pretrained embeddings,(iv) large language model-based clustering,(v) the similarity of algorithms, and (vi) the low-rank structures of performance matrices, yield meaningful insights and promising pathways for clustering research. For instance, our study reveals that: 1) All evaluated deep clustering methods do not exhibit a significant advantage compared with the top-performing conventional clustering algorithms (e.g., KMeans, SpeClu) in terms of average performance; 2) For image and text clustering tasks, combining pretrained embeddings with conventional clustering algorithms (e.g., KMeans, SpeClu) offers effective and efficient clustering; 3) Clustering remains a challenging and nontrivial problem, even in the era of increasingly dominant foundation models. Moreover, we propose to use the low-rank structure in cross-model performance matrices to efficiently approximate the overall performance evaluation in practical applications. We further demonstrate the feasibility of model selection based on the performance matrices across all hyperparameter configurations.

Problem

Research questions and friction points this paper is trying to address.

clustering benchmark

algorithm evaluation

foundation models

deep clustering

hyperparameter tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

clustering benchmark

foundation models

pretrained embeddings