Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

📅 2024-11-05
🏛️ arXiv.org
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a novel instability phenomenon in multimodal large language models (MLLMs): under misleading prompts, MLLMs reverse their initially correct answers in 65% of cases—a critical vulnerability overlooked by prior work on robustness. Method: To systematically evaluate resistance to misdirection, we propose a two-stage misleading-response contrastive paradigm and introduce MUB, the first multimodal uncertainty benchmark covering diverse domains and incorporating both explicit and implicit misleading cues. We design two quantitative metrics—misleading rate and response shift—to assess answer consistency. Leveraging misleading instruction engineering, response consistency analysis, two-stage sampling, and fine-tuning with injected misleading data, we enhance model robustness. Contribution/Results: Our approach significantly reduces misleading rates: state-of-the-art MLLMs exhibit >86% misleading rates baseline, which drop substantially post-fine-tuning. The MUB benchmark and code are publicly released.

Technology Category

Application Category

📝 Abstract
Ensuring that Multimodal Large Language Models (MLLMs) maintain consistency in their responses is essential for developing trustworthy multimodal intelligence. However, existing benchmarks include many samples where all MLLMs extit{exhibit high response uncertainty when encountering misleading information}, requiring even 5-15 response attempts per sample to effectively assess uncertainty. Therefore, we propose a two-stage pipeline: first, we collect MLLMs' responses without misleading information, and then gather misleading ones via specific misleading instructions. By calculating the misleading rate, and capturing both correct-to-incorrect and incorrect-to-correct shifts between the two sets of responses, we can effectively metric the model's response uncertainty. Eventually, we establish a extbf{underline{M}}ultimodal extbf{underline{U}}ncertainty extbf{underline{B}}enchmark ( extbf{MUB}) that employs both explicit and implicit misleading instructions to comprehensively assess the vulnerability of MLLMs across diverse domains. Our experiments reveal that all open-source and close-source MLLMs are highly susceptible to misleading instructions, with an average misleading rate exceeding 86%. To enhance the robustness of MLLMs, we further fine-tune all open-source MLLMs by incorporating explicit and implicit misleading data, which demonstrates a significant reduction in misleading rates. Our code is available at: href{https://github.com/Yunkai696/MUB}{https://github.com/Yunkai696/MUB}
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' vulnerability to misleading cues causing answer flips
Quantifying response uncertainty when correct answers are overturned
Assessing MLLMs' consistency preservation under deceptive information scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage evaluation pipeline for vulnerability quantification
Fine-tuning with compact mixed-instruction dataset
Multimodal Uncertainty Benchmark with stratified difficulty levels
Y
Yunkai Dang
The Hong Kong University of Science and Technology (Guangzhou)
M
Mengxi Gao
The Hong Kong University of Science and Technology (Guangzhou)
Yibo Yan
Yibo Yan
East China Normal University
High-dimensional Statistics
X
Xin Zou
The Hong Kong University of Science and Technology (Guangzhou)
Yanggan Gu
Yanggan Gu
Soochow University
Natural Language ProcessingLanguage Model
Aiwei Liu
Aiwei Liu
Tsinghua University
Natural Language ProcessingLarge Language modelsAI SafetyWatermarking
Xuming Hu
Xuming Hu
Assistant Professor, HKUST(GZ) / HKUST
Natural Language ProcessingLarge Language Model