Towards Multimodal Domain Generalization with Few Labels

πŸ“… 2026-02-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of generalizing multimodal models to unseen domains under limited labeled data by introducing a novel semi-supervised multimodal domain generalization (SSMDG) setting and establishing the first SSMDig benchmark. The proposed unified framework integrates three core mechanisms: consensus-driven consistency regularization to generate reliable pseudo-labels, disagreement-aware regularization to exploit non-consensus samples, and cross-modal prototype alignment to enhance semantic consistency. This approach effectively leverages unlabeled data while improving robustness against both domain shift and missing modalities. Extensive experiments demonstrate that the method significantly outperforms strong baselines in both standard and modality-missing scenarios.

Technology Category

Application Category

πŸ“ Abstract
Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims to learn robust multimodal models from multi-source data with few labeled samples. We observe that existing approaches fail to address this setting effectively: multimodal domain generalization methods cannot exploit unlabeled data, semi-supervised multimodal learning methods ignore domain shifts, and semi-supervised domain generalization methods are confined to single-modality inputs. To overcome these limitations, we propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, which obtains reliable pseudo-labels through confident fused-unimodal consensus; Disagreement-Aware Regularization, which effectively utilizes ambiguous non-consensus samples; and Cross-Modal Prototype Alignment, which enforces domain- and modality-invariant representations while promoting robustness under missing modalities via cross-modal translation. We further establish the first SSMDG benchmarks, on which our method consistently outperforms strong baselines in both standard and missing-modality scenarios. Our benchmarks and code are available at https://github.com/lihongzhao99/SSMDG.
Problem

Research questions and friction points this paper is trying to address.

Multimodal
Domain Generalization
Semi-Supervised Learning
Few Labels
Unseen Domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-Supervised Multimodal Domain Generalization
Consensus-Driven Consistency Regularization
Disagreement-Aware Regularization
Cross-Modal Prototype Alignment
Missing Modality Robustness
πŸ”Ž Similar Papers
No similar papers found.