k-HyperEdge Medoids for Clustering Ensemble

📅 2024-12-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges in ensemble clustering—unreliable base clusterings (cluster-view uncertainty) and computationally expensive pairwise sample relationship construction (sample-view scalability)—this paper proposes a *k-hyperedge centroid discovery* framework. It reformulates ensemble clustering as an optimization problem of identifying *k*-hyperedge centroids on a hypergraph. A joint iterative algorithm is designed, integrating hyperedge diffusion with probabilistic membership estimation to balance efficiency and accuracy. Theoretical analysis establishes near-optimality, convergence, and statistical consistency of the solution. Methodologically, the approach unifies hypergraph-based representation, a differentiable hyperedge loss function, probabilistic membership modeling, and iterative refinement. Extensive experiments on 20 benchmark datasets demonstrate rapid convergence and superior performance over nine state-of-the-art methods. Ablation studies and human-in-the-loop experiments further validate its robustness and interpretability.

Technology Category

Application Category

📝 Abstract
Clustering ensemble has been a popular research topic in data science due to its ability to improve the robustness of the single clustering method. Many clustering ensemble methods have been proposed, most of which can be categorized into clustering-view and sample-view methods. The clustering-view method is generally efficient, but it could be affected by the unreliability that existed in base clustering results. The sample-view method shows good performance, while the construction of the pairwise sample relation is time-consuming. In this paper, the clustering ensemble is formulated as a k-HyperEdge Medoids discovery problem and a clustering ensemble method based on k-HyperEdge Medoids that considers the characteristics of the above two types of clustering ensemble methods is proposed. In the method, a set of hyperedges is selected from the clustering view efficiently, then the hyperedges are diffused and adjusted from the sample view guided by a hyperedge loss function to construct an effective k-HyperEdge Medoid set. The loss function is mainly reduced by assigning samples to the hyperedge with the highest degree of belonging. Theoretical analyses show that the solution can approximate the optimal, the assignment method can gradually reduce the loss function, and the estimation of the belonging degree is statistically reasonable. Experiments on artificial data show the working mechanism of the proposed method. The convergence of the method is verified by experimental analysis of twenty data sets. The effectiveness and efficiency of the proposed method are also verified on these data, with nine representative clustering ensemble algorithms as reference.
Problem

Research questions and friction points this paper is trying to address.

Proposes k-HyperEdge Medoids to combine clustering and sample views
Addresses unreliability in base clustering results efficiently
Reduces time consumption in pairwise sample relation construction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulates clustering ensemble as k-HyperEdge Medoids discovery
Selects hyperedges efficiently from clustering view
Adjusts hyperedges via diffusion with loss function
🔎 Similar Papers
No similar papers found.
F
Feijiang Li
Institute of Big Data Science and Industry, Shanxi University
J
Jieting Wang
Institute of Big Data Science and Industry, Shanxi University
L
Liuya Zhang
Institute of Big Data Science and Industry, Shanxi University
Yuhua Qian
Yuhua Qian
山西大学大数据科学与产业研究院
机器学习、数据挖掘、复杂网络
S
Shuai Jin
Institute of Big Data Science and Industry, Shanxi University
T
Tao Yan
Institute of Big Data Science and Industry, Shanxi University
Liang Du
Liang Du
Associate Professor, Villanova University
electric power systems