🤖 AI Summary
In density-based clustering, existing internal cluster validity indices (CVIs) effectively assess the quality of arbitrarily shaped clusters but fail to evaluate the accuracy of noise labeling—creating a critical assessment gap. To address this, we propose DISCO, the first internal index that jointly quantifies both cluster quality (compactness and separation) and noise labeling accuracy. Its key innovations include: (i) the first integration of noise discrimination accuracy into the CVI framework; (ii) compactness and separation measures rigorously defined via density distribution modeling; and (iii) a noise-sensitivity weighting mechanism enabling fully automatic noise identification. Extensive experiments on multiple benchmark datasets demonstrate that DISCO consistently outperforms state-of-the-art CVIs in both consistency and discriminative power. Crucially, DISCO enables the first holistic, unified internal evaluation of *all* labels in density clustering—including both clusters and noise—thereby closing a long-standing gap in clustering validation.
📝 Abstract
In density-based clustering, clusters are areas of high object density separated by lower object density areas. This notion supports arbitrarily shaped clusters and automatic detection of noise points that do not belong to any cluster. However, it is challenging to adequately evaluate the quality of density-based clustering results. Even though some existing cluster validity indices (CVIs) target arbitrarily shaped clusters, none of them captures the quality of the labeled noise. In this paper, we propose DISCO, a Density-based Internal Score for Clustering Outcomes, which is the first CVI that also evaluates the quality of noise labels. DISCO reliably evaluates density-based clusters of arbitrary shape by assessing compactness and separation. It also introduces a direct assessment of noise labels for any given clustering. Our experiments show that DISCO evaluates density-based clusterings more consistently than its competitors. It is additionally the first method to evaluate the complete labeling of density-based clustering methods, including noise labels.