Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses three fundamental challenges in uncertainty quantification for large multimodal models (LMMs): difficulty in assessing, eliciting, and quantifying uncertainty. To this end, we propose the first model-agnostic, unified framework. Methodologically: (i) we introduce a multimodal semantic uncertainty modeling mechanism; (ii) we design a prompt-perturbation-based paradigm to actively elicit uncertainty; and (iii) we define a response consistency metric and a plug-and-play interface compatible with both open- and closed-weight LMMs. Our key contributions are the first cross-architecture, cross-modal framework for unified uncertainty assessment and controllable elicitation. Extensive evaluation across 18 multimodal benchmarks and 10 diverse LMMs demonstrates significant improvements in hallucination detection and mitigation, as well as enhanced chain-of-thought reasoning grounded in calibrated uncertainty awareness.

Technology Category

Application Category

📝 Abstract

Large Multimodal Models (LMMs), harnessing the complementarity among diverse modalities, are often considered more robust than pure Language Large Models (LLMs); yet do LMMs know what they do not know? There are three key open questions remaining: (1) how to evaluate the uncertainty of diverse LMMs in a unified manner, (2) how to prompt LMMs to show its uncertainty, and (3) how to quantify uncertainty for downstream tasks. In an attempt to address these challenges, we introduce Uncertainty-o: (1) a model-agnostic framework designed to reveal uncertainty in LMMs regardless of their modalities, architectures, or capabilities, (2) an empirical exploration of multimodal prompt perturbations to uncover LMM uncertainty, offering insights and findings, and (3) derive the formulation of multimodal semantic uncertainty, which enables quantifying uncertainty from multimodal responses. Experiments across 18 benchmarks spanning various modalities and 10 LMMs (both open- and closed-source) demonstrate the effectiveness of Uncertainty-o in reliably estimating LMM uncertainty, thereby enhancing downstream tasks such as hallucination detection, hallucination mitigation, and uncertainty-aware Chain-of-Thought reasoning.

Problem

Research questions and friction points this paper is trying to address.

Evaluate uncertainty in diverse LMMs uniformly

Prompt LMMs to reveal their uncertainty

Quantify uncertainty for downstream tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic framework for LMM uncertainty

Multimodal prompt perturbations exploration

Formulating multimodal semantic uncertainty quantification

🔎 Similar Papers

(Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models