🤖 AI Summary
This paper systematically evaluates the performance of 26 internal clustering validity indices (CVIs) to address the fundamental problem of reliably selecting the optimal clustering solution from a set of candidates. We propose a triple-unbiased methodological framework, wherein each sub-method employs dual complementary metrics—e.g., stability and accuracy—to rigorously assess CVIs along three orthogonal dimensions: robustness, scenario adaptability, and algorithmic independence. Our benchmarking infrastructure comprises 16,177 synthetic and real-world datasets, eight state-of-the-art clustering algorithms, and an enhanced evaluation protocol—constituting the largest CVI benchmark to date. Experimental results reveal systematic strengths and weaknesses of each CVI across diverse data characteristics, including cluster shape, noise level, and dimensionality. The study delivers an interpretable, reproducible, and empirically grounded guideline for CVI selection in practical clustering applications.
📝 Abstract
Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we present a comprehensive benchmark study of 26 internal validity indexes, which includes highly popular classic indexes as well as more recently developed ones. We adopted an enhanced revision of the methodology presented in Vendramin et al. (2010), developed here to address several shortcomings of this previous work. This overall new approach consists of three complementary custom-tailored evaluation sub-methodologies, each of which has been designed to assess specific aspects of an index's behaviour while preventing potential biases of the other sub-methodologies. Each sub-methodology features two complementary measures of performance, alongside mechanisms that allow for an in-depth investigation of more complex behaviours of the internal validity indexes under study. Additionally, a new collection of 16177 datasets has been produced, paired with eight widely-used clustering algorithms, for a wider applicability scope and representation of more diverse clustering scenarios.