PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal comment stance detection (MCSD) research suffers from two critical limitations: pseudo-multimodality—where only the source post contains images while comments are purely textual—and user homogenization—where individual user differences are ignored. To address these issues, we introduce U-MStance, the first user-centric multimodal stance detection dataset. We further propose PRISM, a novel framework featuring: (1) longitudinal user history modeling to capture personalized behavioral patterns; (2) a pioneering vision–text chained reasoning alignment mechanism enabling fine-grained cross-modal inference; and (3) bidirectional, mutually reinforcing learning between stance detection and response generation. Extensive experiments demonstrate that PRISM significantly outperforms strong baselines on U-MStance, validating the effectiveness of user-aware representation learning and context-sensitive multimodal reasoning.

Technology Category

Application Category

📝 Abstract
The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific targets within complex discussions. However, existing studies remain limited by: **1) pseudo-multimodality**, where visual cues appear only in source posts while comments are treated as text-only, misaligning with real-world multimodal interactions; and **2) user homogeneity**, where diverse users are treated uniformly, neglecting personal traits that shape stance expression. To address these issues, we introduce **U-MStance**, the first user-centric MCSD dataset, containing over 40k annotated comments across six real-world targets. We further propose **PRISM**, a **P**ersona-**R**easoned mult**I**modal **S**tance **M**odel for MCSD. PRISM first derives longitudinal user personas from historical posts and comments to capture individual traits, then aligns textual and visual cues within conversational context via Chain-of-Thought to bridge semantic and pragmatic gaps across modalities. Finally, a mutual task reinforcement mechanism is employed to jointly optimize stance detection and stance-aware response generation for bidirectional knowledge transfer. Experiments on U-MStance demonstrate that PRISM yields significant gains over strong baselines, underscoring the effectiveness of user-centric and context-grounded multimodal reasoning for realistic stance understanding.
Problem

Research questions and friction points this paper is trying to address.

Addresses pseudo-multimodality in conversational stance detection
Resolves user homogeneity by incorporating personal traits
Aligns multimodal cues through persona-reasoned contextual analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives user personas from historical multimodal data
Aligns text and visual cues via Chain-of-Thought reasoning
Jointly optimizes stance detection and response generation
🔎 Similar Papers
No similar papers found.
Bingbing Wang
Bingbing Wang
Harbin Institute of Technology, Shenzhen
natural language processing
Zhixin Bai
Zhixin Bai
Harbin Institute of Technology
natural language processing
Z
Zhengda Jin
Harbin Institute of Technology, Shenzhen, China
Z
Zihan Wang
Harbin Institute of Technology, Shenzhen, China
X
Xintong Song
Harbin Institute of Technology, Shenzhen, China
J
Jingjie Lin
Harbin Institute of Technology, Shenzhen, China
S
Sixuan Li
Macau University of Science and Technology, Hong Kong, China
J
Jing Li
The Hong Kong Polytechnic University, Hong Kong, China
Ruifeng Xu
Ruifeng Xu
Professor, Harbin Institute of Technology at Shenzhen
Natural Language ProcessingAffective ComputingArgumentation MiningLLMsBioinformatics