PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

πŸ“… 2026-05-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

211K/year
πŸ€– AI Summary
This work addresses the pragmatic incongruity between literal semantics and nonverbal cues in multimodal sarcasm detection by proposing a decoupled, dual-level incongruity modeling paradigm. The approach employs a polarity-modulated attention mechanism to capture consistency at both atomic and compositional levels, and integrates scalar consistency routing with a prior-guided contextual graph to enable selective fusion of multi-granular incongruity evidence. Furthermore, an asymmetry-aware optimization strategy driven by incongruity-aware contrastive learning is introduced to enhance model robustness. Evaluated on the MUStARD benchmark and its debiased variant, the proposed method achieves a new state of the art, outperforming the strongest baseline by 3.14% in Macro-F1, thereby substantiating its effectiveness in capturing multimodal sarcasm through fine-grained incongruity modeling.
πŸ“ Abstract
Multimodal sarcasm detection, which aims to precisely identify pragmatic incongruities between literal text and nonverbal cues, has gained substantial attention in multimodal understanding. Recent advancements have predominantly relied on naΓ―ve similarity-based attention mechanisms and uniform late fusion strategies.Furthermore, given that functional entanglement restricts traditional late fusions, we incorporate a scalar congruity routing mechanism and a prior-guided contextual graph. This mechanism anchors a generalized incongruity manifold through a two-stage asymmetric optimization driven by inconsistency-aware contrastive learning, selectively fusing only the most discriminative multi-granularity evidence. Extensive experiments on the \texttt{MUStARD} benchmark and its spurious-correlation-mitigated balanced datasets demonstrate that our approach achieves new state-of-the-art performance, surpassing the strongest multimodal baseline by a substantial 3.14\% improvement in Macro-F1. By architecturally isolating atomic, composition, and contextual conflicts. This work provides a robust, decoupled paradigm for modeling subtle pragmatic incongruities in human communication.
Problem

Research questions and friction points this paper is trying to address.

multimodal sarcasm detection
pragmatic incongruity
text-nonverbal incongruity
incongruity modeling
multimodal understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polarity-Modulated Attention
Congruity Routing
Incongruity Manifold
Contrastive Learning
Multimodal Sarcasm Detection