PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the pragmatic incongruity between literal semantics and nonverbal cues in multimodal sarcasm detection by proposing a decoupled, dual-level incongruity modeling paradigm. The approach employs a polarity-modulated attention mechanism to capture consistency at both atomic and compositional levels, and integrates scalar consistency routing with a prior-guided contextual graph to enable selective fusion of multi-granular incongruity evidence. Furthermore, an asymmetry-aware optimization strategy driven by incongruity-aware contrastive learning is introduced to enhance model robustness. Evaluated on the MUStARD benchmark and its debiased variant, the proposed method achieves a new state of the art, outperforming the strongest baseline by 3.14% in Macro-F1, thereby substantiating its effectiveness in capturing multimodal sarcasm through fine-grained incongruity modeling.

📝 Abstract

Multimodal sarcasm detection, which aims to precisely identify pragmatic incongruities between literal text and nonverbal cues, has gained substantial attention in multimodal understanding. Recent advancements have predominantly relied on naïve similarity-based attention mechanisms and uniform late fusion strategies.Furthermore, given that functional entanglement restricts traditional late fusions, we incorporate a scalar congruity routing mechanism and a prior-guided contextual graph. This mechanism anchors a generalized incongruity manifold through a two-stage asymmetric optimization driven by inconsistency-aware contrastive learning, selectively fusing only the most discriminative multi-granularity evidence. Extensive experiments on the \texttt{MUStARD} benchmark and its spurious-correlation-mitigated balanced datasets demonstrate that our approach achieves new state-of-the-art performance, surpassing the strongest multimodal baseline by a substantial 3.14\% improvement in Macro-F1. By architecturally isolating atomic, composition, and contextual conflicts. This work provides a robust, decoupled paradigm for modeling subtle pragmatic incongruities in human communication.

Problem

Research questions and friction points this paper is trying to address.

multimodal sarcasm detection

pragmatic incongruity

text-nonverbal incongruity

incongruity modeling

multimodal understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Polarity-Modulated Attention

Congruity Routing

Incongruity Manifold