🤖 AI Summary
Existing cross-domain (out-of-distribution, OOD) multimodal completion methods struggle with missing modalities, rely heavily on complete training data, and require intricate, modality-specific fusion architectures. Method: This paper proposes a training-free “knowledge bridging” framework that leverages large multimodal models (LMMs) to automatically construct modality-agnostic knowledge graphs grounded in domain priors for structured information extraction, followed by a two-stage mechanism—semantic-aligned generation and ranking—for robust completion. Contribution/Results: It introduces the first fine-tuning-free, modality-agnostic prompting paradigm; jointly optimizes generation and ranking under knowledge-graph guidance; and significantly improves OOD generalization. Evaluated on both general and medical multimodal benchmarks, it outperforms all state-of-the-art methods, achieving an average 12.6% improvement in completion quality over direct LMM invocation under OOD conditions.
📝 Abstract
Previous successful approaches to missing modality completion rely on carefully designed fusion techniques and extensive pre-training on complete data, which can limit their generalizability in out-of-domain (OOD) scenarios. In this study, we pose a new challenge: can we develop a missing modality completion model that is both resource-efficient and robust to OOD generalization? To address this, we present a training-free framework for missing modality completion that leverages large multimodal models (LMMs). Our approach, termed the"Knowledge Bridger", is modality-agnostic and integrates generation and ranking of missing modalities. By defining domain-specific priors, our method automatically extracts structured information from available modalities to construct knowledge graphs. These extracted graphs connect the missing modality generation and ranking modules through the LMM, resulting in high-quality imputations of missing modalities. Experimental results across both general and medical domains show that our approach consistently outperforms competing methods, including in OOD generalization. Additionally, our knowledge-driven generation and ranking techniques demonstrate superiority over variants that directly employ LMMs for generation and ranking, offering insights that may be valuable for applications in other domains.