🤖 AI Summary
To address the poor robustness of graph construction and high gradient noise in graph reconstruction for incomplete multi-view clustering, this paper proposes a dynamic deep graph learning framework. The method abandons static k-nearest-neighbor graphs and jointly learns view-specific dynamic graph structures and high-order semantic representations to construct a robust global graph. It introduces a masked graph reconstruction loss and a graph-structure contrastive learning mechanism to effectively suppress gradient noise. Additionally, it integrates graph convolutional embedding, graph self-attention encoding, and global-graph-guided imputation of missing views, coupled with pseudo-label-based self-supervised training. Extensive experiments on multiple real-world datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches in clustering performance, validating its effectiveness, robustness, and generalizability.
📝 Abstract
The prevalence of real-world multi-view data makes incomplete multi-view clustering (IMVC) a crucial research. The rapid development of Graph Neural Networks (GNNs) has established them as one of the mainstream approaches for multi-view clustering. Despite significant progress in GNNs-based IMVC, some challenges remain: (1) Most methods rely on the K-Nearest Neighbors (KNN) algorithm to construct static graphs from raw data, which introduces noise and diminishes the robustness of the graph topology. (2) Existing methods typically utilize the Mean Squared Error (MSE) loss between the reconstructed graph and the sparse adjacency graph directly as the graph reconstruction loss, leading to substantial gradient noise during optimization. To address these issues, we propose a novel extbf{D}ynamic Deep extbf{G}raph Learning for extbf{I}ncomplete extbf{M}ulti- extbf{V}iew extbf{C}lustering with extbf{M}asked Graph Reconstruction Loss (DGIMVCM). Firstly, we construct a missing-robust global graph from the raw data. A graph convolutional embedding layer is then designed to extract primary features and refined dynamic view-specific graph structures, leveraging the global graph for imputation of missing views. This process is complemented by graph structure contrastive learning, which identifies consistency among view-specific graph structures. Secondly, a graph self-attention encoder is introduced to extract high-level representations based on the imputed primary features and view-specific graphs, and is optimized with a masked graph reconstruction loss to mitigate gradient noise during optimization. Finally, a clustering module is constructed and optimized through a pseudo-label self-supervised training mechanism. Extensive experiments on multiple datasets validate the effectiveness and superiority of DGIMVCM.