🤖 AI Summary
To address the significant performance degradation in multimodal rumor detection caused by missing modalities, this paper proposes TriSPrompt, a hierarchical soft-prompting model. TriSPrompt introduces, for the first time, a learnable hierarchical soft-prompt mechanism that jointly models modality awareness, missing-modality state identification, and subjective–objective perspective interaction, enabling dynamic representation optimization for incomplete multimodal inputs. By parameterizing prompt vectors, the model adaptively refines cross-modal semantic alignment without reconstructing missing modalities. Extensive experiments on three real-world benchmark datasets demonstrate that TriSPrompt achieves an average accuracy improvement of over 13% compared to state-of-the-art methods. It consistently exhibits superior generalization and robustness across diverse modality-missing scenarios. This work establishes a novel paradigm for trustworthy content identification under incomplete multimodal conditions.
📝 Abstract
The widespread presence of incomplete modalities in multimodal data poses a significant challenge to achieving accurate rumor detection. Existing multimodal rumor detection methods primarily focus on learning joint modality representations from emph{complete} multimodal training data, rendering them ineffective in addressing the common occurrence of emph{missing modalities} in real-world scenarios. In this paper, we propose a hierarchical soft prompt model extsf{TriSPrompt}, which integrates three types of prompts, extit{i.e.}, emph{modality-aware} (MA) prompt, emph{modality-missing} (MM) prompt, and emph{mutual-views} (MV) prompt, to effectively detect rumors in incomplete multimodal data. The MA prompt captures both heterogeneous information from specific modalities and homogeneous features from available data, aiding in modality recovery. The MM prompt models missing states in incomplete data, enhancing the model's adaptability to missing information. The MV prompt learns relationships between subjective ( extit{i.e.}, text and image) and objective ( extit{i.e.}, comments) perspectives, effectively detecting rumors. Extensive experiments on three real-world benchmarks demonstrate that extsf{TriSPrompt} achieves an accuracy gain of over 13% compared to state-of-the-art methods. The codes and datasets are available at https: //anonymous.4open.science/r/code-3E88.