TMDC: A Two-Stage Modality Denoising and Complementation Framework for Multimodal Sentiment Analysis with Missing and Noisy Modalities

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal sentiment analysis (MSA) often suffers from concurrent modality missing and noisy input in real-world scenarios, severely degrading model robustness and accuracy. To address this, we propose TMDC—a two-stage framework that is the first to jointly model both modality incompleteness and noise corruption. In the first stage, modality-specific and shared representations are jointly denoised to suppress noisy signals; in the second stage, cross-modal complementary mechanisms enable adaptive imputation of missing modalities. This design eliminates reliance on fully observed modalities, thereby enhancing the robustness and discriminability of multimodal representations. Extensive experiments on MOSI, MOSEI, and IEMOCAP demonstrate that TMDC consistently outperforms state-of-the-art methods, establishing new SOTA results across multiple benchmarks.

Technology Category

Application Category

📝 Abstract
Multimodal Sentiment Analysis (MSA) aims to infer human sentiment by integrating information from multiple modalities such as text, audio, and video. In real-world scenarios, however, the presence of missing modalities and noisy signals significantly hinders the robustness and accuracy of existing models. While prior works have made progress on these issues, they are typically addressed in isolation, limiting overall effectiveness in practical settings. To jointly mitigate the challenges posed by missing and noisy modalities, we propose a framework called Two-stage Modality Denoising and Complementation (TMDC). TMDC comprises two sequential training stages. In the Intra-Modality Denoising Stage, denoised modality-specific and modality-shared representations are extracted from complete data using dedicated denoising modules, reducing the impact of noise and enhancing representational robustness. In the Inter-Modality Complementation Stage, these representations are leveraged to compensate for missing modalities, thereby enriching the available information and further improving robustness. Extensive evaluations on MOSI, MOSEI, and IEMOCAP demonstrate that TMDC consistently achieves superior performance compared to existing methods, establishing new state-of-the-art results.
Problem

Research questions and friction points this paper is trying to address.

Addresses missing modalities in multimodal sentiment analysis
Mitigates noisy signals across text, audio, and video modalities
Enhances robustness by jointly handling missing and noisy data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework for multimodal sentiment analysis
Denoising modules extract clean modality representations
Inter-modality complementation enriches missing modality information
🔎 Similar Papers
No similar papers found.