A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of view incompleteness and label scarcity in multi-view data, this paper proposes a semi-supervised generative multi-view learning model. Methodologically, it jointly optimizes an information bottleneck objective on labeled data and a variational likelihood objective on unlabeled data to construct a shared latent space; enforces multi-view consistency via cross-view mutual information maximization; and employs a product-of-experts architecture to ensure robustness against arbitrary view-missing patterns. The work innovatively extends the information bottleneck principle to a semi-supervised generative framework, seamlessly integrating variational inference with multi-view consistency constraints. Experiments on image and multi-omics benchmark datasets demonstrate that the model significantly outperforms state-of-the-art methods in both classification accuracy and missing-view imputation quality—particularly under extreme label scarcity.

Technology Category

Application Category

📝 Abstract
Multi-view learning is widely applied to real-life datasets, such as multiple omics biological data, but it often suffers from both missing views and missing labels. Prior probabilistic approaches addressed the missing view problem by using a product-of-experts scheme to aggregate representations from present views and achieved superior performance over deterministic classifiers, using the information bottleneck (IB) principle. However, the IB framework is inherently fully supervised and cannot leverage unlabeled data. In this work, we propose a semi-supervised generative model that utilizes both labeled and unlabeled samples in a unified framework. Our method maximizes the likelihood of unlabeled samples to learn a latent space shared with the IB on labeled data. We also perform cross-view mutual information maximization in the latent space to enhance the extraction of shared information across views. Compared to existing approaches, our model achieves better predictive and imputation performance on both image and multi-omics data with missing views and limited labeled samples.
Problem

Research questions and friction points this paper is trying to address.

Integrating incomplete multi-view data with missing labels
Leveraging unlabeled data in semi-supervised generative framework
Enhancing cross-view information extraction with mutual information maximization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised generative model with unified framework
Latent space learning via likelihood maximization
Cross-view mutual information maximization technique
🔎 Similar Papers
No similar papers found.
Y
Yiyang Shen
University of Iowa, Iowa City, Iowa, USA
Weiran Wang
Weiran Wang
University of Iowa
Machine learningspeech processing