🤖 AI Summary
To address the dual challenges in ensemble clustering—unreliable base clusterings (cluster-view uncertainty) and computationally expensive pairwise sample relationship construction (sample-view scalability)—this paper proposes a *k-hyperedge centroid discovery* framework. It reformulates ensemble clustering as an optimization problem of identifying *k*-hyperedge centroids on a hypergraph. A joint iterative algorithm is designed, integrating hyperedge diffusion with probabilistic membership estimation to balance efficiency and accuracy. Theoretical analysis establishes near-optimality, convergence, and statistical consistency of the solution. Methodologically, the approach unifies hypergraph-based representation, a differentiable hyperedge loss function, probabilistic membership modeling, and iterative refinement. Extensive experiments on 20 benchmark datasets demonstrate rapid convergence and superior performance over nine state-of-the-art methods. Ablation studies and human-in-the-loop experiments further validate its robustness and interpretability.
📝 Abstract
Clustering ensemble has been a popular research topic in data science due to its ability to improve the robustness of the single clustering method. Many clustering ensemble methods have been proposed, most of which can be categorized into clustering-view and sample-view methods. The clustering-view method is generally efficient, but it could be affected by the unreliability that existed in base clustering results. The sample-view method shows good performance, while the construction of the pairwise sample relation is time-consuming. In this paper, the clustering ensemble is formulated as a k-HyperEdge Medoids discovery problem and a clustering ensemble method based on k-HyperEdge Medoids that considers the characteristics of the above two types of clustering ensemble methods is proposed. In the method, a set of hyperedges is selected from the clustering view efficiently, then the hyperedges are diffused and adjusted from the sample view guided by a hyperedge loss function to construct an effective k-HyperEdge Medoid set. The loss function is mainly reduced by assigning samples to the hyperedge with the highest degree of belonging. Theoretical analyses show that the solution can approximate the optimal, the assignment method can gradually reduce the loss function, and the estimation of the belonging degree is statistically reasonable. Experiments on artificial data show the working mechanism of the proposed method. The convergence of the method is verified by experimental analysis of twenty data sets. The effectiveness and efficiency of the proposed method are also verified on these data, with nine representative clustering ensemble algorithms as reference.