π€ AI Summary
This work addresses the challenges in multi-view learning arising from severe dimensional imbalance, where high-dimensional views dominate, low-dimensional information is overlooked, and representation alignment becomes difficult. To tackle these issues, the authors propose AdaMuS, an adaptive sparse multi-view learning framework. AdaMuS employs view-specific encoders to map heterogeneous data into a unified space, incorporates a parameter-free pruning strategy to mitigate overfitting in low-dimensional views, and introduces a sparse fusion mechanism that adaptively removes redundancy and aligns multi-view representations. Furthermore, similarity graph-based self-supervised learning is leveraged to enhance model generalization. Extensive experiments on a synthetic dataset and seven real-world benchmarks demonstrate that AdaMuS achieves state-of-the-art performance in both classification and semantic segmentation tasks.
π Abstract
Multi-view learning primarily aims to fuse multiple features to describe data comprehensively. Most prior studies implicitly assume that different views share similar dimensions. In practice, however, severe dimensional disparities often exist among different views, leading to the unbalanced multi-view learning issue. For example, in emotion recognition tasks, video frames often reach dimensions of $10^6$, while physiological signals comprise only $10^1$ dimensions. Existing methods typically face two main challenges for this problem: (1) They often bias towards high-dimensional data, overlooking the low-dimensional views. (2) They struggle to effectively align representations under extreme dimensional imbalance, which introduces severe redundancy into the low-dimensional ones. To address these issues, we propose the Adaptive Multi-view Sparsity Learning (AdaMuS) framework. First, to prevent ignoring the information of low-dimensional views, we construct view-specific encoders to map them into a unified dimensional space. Given that mapping low-dimensional data to a high-dimensional space often causes severe overfitting, we design a parameter-free pruning method to adaptively remove redundant parameters in the encoders. Furthermore, we propose a sparse fusion paradigm that flexibly suppresses redundant dimensions and effectively aligns each view. Additionally, to learn representations with stronger generalization, we propose a self-supervised learning paradigm that obtains supervision information by constructing similarity graphs. Extensive evaluations on a synthetic toy dataset and seven real-world benchmarks demonstrate that AdaMuS consistently achieves superior performance and exhibits strong generalization across both classification and semantic segmentation tasks.