PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing multimodal comment stance detection (MCSD) research suffers from two critical limitations: pseudo-multimodality—where only the source post contains images while comments are purely textual—and user homogenization—where individual user differences are ignored. To address these issues, we introduce U-MStance, the first user-centric multimodal stance detection dataset. We further propose PRISM, a novel framework featuring: (1) longitudinal user history modeling to capture personalized behavioral patterns; (2) a pioneering vision–text chained reasoning alignment mechanism enabling fine-grained cross-modal inference; and (3) bidirectional, mutually reinforcing learning between stance detection and response generation. Extensive experiments demonstrate that PRISM significantly outperforms strong baselines on U-MStance, validating the effectiveness of user-aware representation learning and context-sensitive multimodal reasoning.

Technology Category

Application Category

📝 Abstract

The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific targets within complex discussions. However, existing studies remain limited by: **1) pseudo-multimodality**, where visual cues appear only in source posts while comments are treated as text-only, misaligning with real-world multimodal interactions; and **2) user homogeneity**, where diverse users are treated uniformly, neglecting personal traits that shape stance expression. To address these issues, we introduce **U-MStance**, the first user-centric MCSD dataset, containing over 40k annotated comments across six real-world targets. We further propose **PRISM**, a **P**ersona-**R**easoned mult**I**modal **S**tance **M**odel for MCSD. PRISM first derives longitudinal user personas from historical posts and comments to capture individual traits, then aligns textual and visual cues within conversational context via Chain-of-Thought to bridge semantic and pragmatic gaps across modalities. Finally, a mutual task reinforcement mechanism is employed to jointly optimize stance detection and stance-aware response generation for bidirectional knowledge transfer. Experiments on U-MStance demonstrate that PRISM yields significant gains over strong baselines, underscoring the effectiveness of user-centric and context-grounded multimodal reasoning for realistic stance understanding.

Problem

Research questions and friction points this paper is trying to address.

Addresses pseudo-multimodality in conversational stance detection

Resolves user homogeneity by incorporating personal traits

Aligns multimodal cues through persona-reasoned contextual analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives user personas from historical multimodal data

Aligns text and visual cues via Chain-of-Thought reasoning

Jointly optimizes stance detection and response generation

🔎 Similar Papers

No similar papers found.