Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal domain adaptation faces two key challenges: high annotation costs lead to scarce labeled source data, and heterogeneous distribution shifts across modalities between source and target domains hinder unified alignment. To address these, we propose a balanced multi-objective optimization framework. First, we employ information bottleneck principles to learn modality-specific representations, mitigating inter-modality interference. Second, we design a correlation alignment mechanism to achieve cross-domain semantic matching in the feature space. Third, we formulate multimodal domain adaptation as a multi-objective optimization problem and derive a closed-form quadratic programming approximation of the Pareto-optimal solution via theoretical analysis, significantly improving optimization efficiency and inter-modal balance. Extensive experiments on multiple benchmark datasets demonstrate that our method achieves new state-of-the-art performance with substantially reduced labeling cost, validating its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract
Multimodal learning, while contributing to numerous success stories across various fields, faces the challenge of prohibitively expensive manual annotation. To address the scarcity of annotated data, a popular solution is unsupervised domain adaptation, which has been extensively studied in unimodal settings yet remains less explored in multimodal settings. In this paper, we investigate heterogeneous multimodal domain adaptation, where the primary challenge is the varying domain shifts of different modalities from the source to the target domain. We first introduce the information bottleneck method to learn representations for each modality independently, and then match the source and target domains in the representation space with correlation alignment. To balance the domain alignment of all modalities, we formulate the problem as a multi-objective task, aiming for a Pareto optimal solution. By exploiting the properties specific to our model, the problem can be simplified to a quadratic programming problem. Further approximation yields a closed-form solution, leading to an efficient modality-balanced multimodal domain adaptation algorithm. The proposed method features extbf{B}alanced multi- extbf{o}bjective extbf{o}ptimization for extbf{m}ultimodal extbf{d}omain extbf{a}daptation, termed extbf{Boomda}. Extensive empirical results showcase the effectiveness of the proposed approach and demonstrate that Boomda outperforms the competing schemes. The code is is available at: https://github.com/sunjunaimer/Boomda.git.
Problem

Research questions and friction points this paper is trying to address.

Addresses multimodal domain adaptation with varying domain shifts across modalities
Balances domain alignment of all modalities through multi-objective optimization
Solves expensive manual annotation in multimodal learning via unsupervised adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Independent modality representation learning via information bottleneck
Correlation alignment for cross-domain representation matching
Multi-objective optimization for balanced multimodal domain adaptation
🔎 Similar Papers
2024-06-13Neural Information Processing SystemsCitations: 0
J
Jun Sun
Zhejinag Lab, Hangzhou, China
Xinxin Zhang
Xinxin Zhang
Department of Electrical Engineering, Technical University of Denmark
Functional ModellingArtificial IntelligenceAlarm Design
S
Simin Hong
J
Jian Zhu
Zhejinag Lab, Hangzhou, China
X
Xiang Gao
Zhejinag Lab, Hangzhou, China