Benchmarking of Clustering Validity Measures Revisited

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper systematically evaluates the performance of 26 internal clustering validity indices (CVIs) to address the fundamental problem of reliably selecting the optimal clustering solution from a set of candidates. We propose a triple-unbiased methodological framework, wherein each sub-method employs dual complementary metrics—e.g., stability and accuracy—to rigorously assess CVIs along three orthogonal dimensions: robustness, scenario adaptability, and algorithmic independence. Our benchmarking infrastructure comprises 16,177 synthetic and real-world datasets, eight state-of-the-art clustering algorithms, and an enhanced evaluation protocol—constituting the largest CVI benchmark to date. Experimental results reveal systematic strengths and weaknesses of each CVI across diverse data characteristics, including cluster shape, noise level, and dimensionality. The study delivers an interpretable, reproducible, and empirically grounded guideline for CVI selection in practical clustering applications.

Technology Category

Application Category

📝 Abstract
Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we present a comprehensive benchmark study of 26 internal validity indexes, which includes highly popular classic indexes as well as more recently developed ones. We adopted an enhanced revision of the methodology presented in Vendramin et al. (2010), developed here to address several shortcomings of this previous work. This overall new approach consists of three complementary custom-tailored evaluation sub-methodologies, each of which has been designed to assess specific aspects of an index's behaviour while preventing potential biases of the other sub-methodologies. Each sub-methodology features two complementary measures of performance, alongside mechanisms that allow for an in-depth investigation of more complex behaviours of the internal validity indexes under study. Additionally, a new collection of 16177 datasets has been produced, paired with eight widely-used clustering algorithms, for a wider applicability scope and representation of more diverse clustering scenarios.
Problem

Research questions and friction points this paper is trying to address.

Evaluating 26 internal clustering validity indexes comprehensively
Developing enhanced methodology to address previous benchmarking limitations
Testing indexes on 16177 datasets with eight clustering algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed three complementary custom-tailored evaluation sub-methodologies
Created a new collection of 16177 datasets for diverse scenarios
Adopted enhanced methodology to address previous study shortcomings
🔎 Similar Papers
No similar papers found.
C
Connor Simpson
School of Information and Physical Sciences, The University of Newcastle
Ricardo J. G. B. Campello
Ricardo J. G. B. Campello
Full Professor, Dept. of Mathematics and Computer Science, University of Southern Denmark
Data ScienceData MiningMachine Learning
E
Elizabeth Stojanovski
School of Information and Physical Sciences, The University of Newcastle