🤖 AI Summary
Existing unsupervised clustering model evaluation methods suffer from bias and inflexibility when comparing models under varying numbers of clusters, diverse clustering definitions, or pairwise constraints, as they rely on fixed ground-truth structures or single clustering paradigms.
Method: This paper proposes an ensemble-based consensus framework for unsupervised model ranking. It constructs a consensus matrix as a reference structure and ranks candidate models by measuring the connectivity distance between their adjacency matrices and the consensus matrix.
Contribution/Results: To our knowledge, this is the first work to integrate consensus clustering into the model evaluation phase—eliminating dependence on predefined cluster structures or a specific clustering definition, while supporting arbitrary numbers of clusters and incorporation of must-link/cannot-link constraints. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method significantly outperforms state-of-the-art evaluation metrics in identifying the model best aligned with the intrinsic consensus structure of the data, thereby enhancing both reliability and generalizability of model selection.
📝 Abstract
Evaluating the performance of clustering models is a challenging task where the outcome depends on the definition of what constitutes a cluster. Due to this design, current existing metrics rarely handle multiple clustering models with diverse cluster definitions, nor do they comply with the integration of constraints when available. In this work, we take inspiration from consensus clustering and assume that a set of clustering models is able to uncover hidden structures in the data. We propose to construct a discriminative ordering through ensemble clustering based on the distance between the connectivity of a clustering model and the consensus matrix. We first validate the proposed method with synthetic scenarios, highlighting that the proposed score ranks the models that best match the consensus first. We then show that this simple ranking score significantly outperforms other scoring methods when comparing sets of different clustering algorithms that are not restricted to a fixed number of clusters and is compatible with clustering constraints.