🤖 AI Summary
Multi-view feature learning suffers from inadequate modeling of view consistency, hindering cross-view semantic alignment and discriminative representation learning. To address this, we propose the Hierarchical Consensus Network (HCN), the first framework to establish a three-tiered consensus mechanism—classification consensus, encoding consensus, and global consensus—that jointly enforces consistency across views. Classification consensus unifies canonical correlation analysis (CCA) and contrastive learning paradigms; encoding consensus integrates discriminative and structural information; and global consensus coordinates both via hierarchical constraint optimization. By synergistically combining CCA, contrastive learning, and multi-level regularization, HCN achieves significant improvements over state-of-the-art methods on four benchmark multi-view datasets, enhancing both feature discriminability and cross-view robustness. Our core contribution lies in the principled design of the three-level consensus framework and its unified formalization of view consistency.
📝 Abstract
Multiview feature learning aims to learn discriminative features by integrating the distinct information in each view. However, most existing methods still face significant challenges in learning view-consistency features, which are crucial for effective multiview learning. Motivated by the theories of CCA and contrastive learning in multiview feature learning, we propose the hierarchical consensus network (HCN) in this paper. The HCN derives three consensus indices for capturing the hierarchical consensus across views, which are classifying consensus, coding consensus, and global consensus, respectively. Specifically, classifying consensus reinforces class-level correspondence between views from a CCA perspective, while coding consensus closely resembles contrastive learning and reflects contrastive comparison of individual instances. Global consensus aims to extract consensus information from two perspectives simultaneously. By enforcing the hierarchical consensus, the information within each view is better integrated to obtain more comprehensive and discriminative features. The extensive experimental results obtained on four multiview datasets demonstrate that the proposed method significantly outperforms several state-of-the-art methods.