MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the susceptibility mechanisms of large vision-language models (LVLMs) to multimodal persuasive content (image-text/video-text), addressing the lack of systematic frameworks in prior work. Method: We construct a novel multimodal persuasion dataset spanning commercial, subjective, and adversarial scenarios, and propose the first joint evaluation framework integrating third-party consistency scoring with autoregressive token probability analysis. Leveraging classical persuasion principles, we design semantically aligned image–video–text triads and conduct cross-model comparative experiments to quantify strategy efficacy. Contribution/Results: Multimodal inputs exhibit significantly stronger persuasive power than text alone—especially under misinformation conditions—though pre-existing model preferences only partially mitigate this advantage. Crucially, strategy effectiveness is highly context-dependent. Our work establishes both theoretical foundations and an empirical benchmark for enhancing LVLM robustness and safety against persuasive multimodal attacks.

Technology Category

Application Category

📝 Abstract
As Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news, they are exposed to pervasive persuasive content. A critical question is how these models function as persuadees-how and why they can be influenced by persuasive multimodal inputs. Understanding both their susceptibility to persuasion and the effectiveness of different persuasive strategies is crucial, as overly persuadable models may adopt misleading beliefs, override user preferences, or generate unethical or unsafe outputs when exposed to manipulative messages. We introduce MMPersuade, a unified framework for systematically studying multimodal persuasion dynamics in LVLMs. MMPersuade contributes (i) a comprehensive multimodal dataset that pairs images and videos with established persuasion principles across commercial, subjective and behavioral, and adversarial contexts, and (ii) an evaluation framework that quantifies both persuasion effectiveness and model susceptibility via third-party agreement scoring and self-estimated token probabilities on conversation histories. Our study of six leading LVLMs as persuadees yields three key insights: (i) multimodal inputs substantially increase persuasion effectiveness-and model susceptibility-compared to text alone, especially in misinformation scenarios; (ii) stated prior preferences decrease susceptibility, yet multimodal information maintains its persuasive advantage; and (iii) different strategies vary in effectiveness across contexts, with reciprocity being most potent in commercial and subjective contexts, and credibility and logic prevailing in adversarial contexts. By jointly analyzing persuasion effectiveness and susceptibility, MMPersuade provides a principled foundation for developing models that are robust, preference-consistent, and ethically aligned when engaging with persuasive multimodal content.
Problem

Research questions and friction points this paper is trying to address.

Studying how multimodal inputs persuade large vision-language models
Evaluating model susceptibility to persuasive strategies across contexts
Developing robust models against manipulative multimodal content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset with persuasion principles
Evaluation framework using agreement scoring
Analyzes persuasion effectiveness and model susceptibility
🔎 Similar Papers
No similar papers found.
Haoyi Qiu
Haoyi Qiu
UCLA
Trustworthy AIMultimodality
Yilun Zhou
Yilun Zhou
Massachusetts Institute of Technology
Machine LearningRobotics
P
Pranav Narayanan Venkit
Salesforce AI Research
K
Kung-Hsiang Huang
Salesforce AI Research
J
Jiaxin Zhang
Salesforce AI Research
N
Nanyun Peng
University of California, Los Angeles
C
Chien-Sheng Wu
Salesforce AI Research