Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

📅 2024-11-05

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 0

🤖 AI Summary

This paper identifies a novel instability phenomenon in multimodal large language models (MLLMs): under misleading prompts, MLLMs reverse their initially correct answers in 65% of cases—a critical vulnerability overlooked by prior work on robustness. Method: To systematically evaluate resistance to misdirection, we propose a two-stage misleading-response contrastive paradigm and introduce MUB, the first multimodal uncertainty benchmark covering diverse domains and incorporating both explicit and implicit misleading cues. We design two quantitative metrics—misleading rate and response shift—to assess answer consistency. Leveraging misleading instruction engineering, response consistency analysis, two-stage sampling, and fine-tuning with injected misleading data, we enhance model robustness. Contribution/Results: Our approach significantly reduces misleading rates: state-of-the-art MLLMs exhibit >86% misleading rates baseline, which drop substantially post-fine-tuning. The MUB benchmark and code are publicly released.

Technology Category

Application Category

📝 Abstract

Ensuring that Multimodal Large Language Models (MLLMs) maintain consistency in their responses is essential for developing trustworthy multimodal intelligence. However, existing benchmarks include many samples where all MLLMs extit{exhibit high response uncertainty when encountering misleading information}, requiring even 5-15 response attempts per sample to effectively assess uncertainty. Therefore, we propose a two-stage pipeline: first, we collect MLLMs' responses without misleading information, and then gather misleading ones via specific misleading instructions. By calculating the misleading rate, and capturing both correct-to-incorrect and incorrect-to-correct shifts between the two sets of responses, we can effectively metric the model's response uncertainty. Eventually, we establish a extbf{underline{M}}ultimodal extbf{underline{U}}ncertainty extbf{underline{B}}enchmark ( extbf{MUB}) that employs both explicit and implicit misleading instructions to comprehensively assess the vulnerability of MLLMs across diverse domains. Our experiments reveal that all open-source and close-source MLLMs are highly susceptible to misleading instructions, with an average misleading rate exceeding 86%. To enhance the robustness of MLLMs, we further fine-tune all open-source MLLMs by incorporating explicit and implicit misleading data, which demonstrates a significant reduction in misleading rates. Our code is available at: href{https://github.com/Yunkai696/MUB}{https://github.com/Yunkai696/MUB}

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' vulnerability to misleading cues causing answer flips

Quantifying response uncertainty when correct answers are overturned

Assessing MLLMs' consistency preservation under deceptive information scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage evaluation pipeline for vulnerability quantification

Fine-tuning with compact mixed-instruction dataset

Multimodal Uncertainty Benchmark with stratified difficulty levels

🔎 Similar Papers

Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models

0Citations: 0

Authors to Follow