Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations of multimodal large language models (MLLMs) are fragmented and overlook modality-specific risks, hindering holistic trustworthiness assessment. Method: We introduce MultiTrust-X—the first comprehensive benchmark covering authenticity, robustness, safety, fairness, and privacy—grounded in a novel three-dimensional analytical framework. It systematically identifies two previously unrecognized vulnerability classes: “multimodal risks” and “cross-modal impacts,” exposing limitations and unintended side effects of current mitigation strategies. The benchmark integrates 32 tasks, 28 datasets, and 8 mainstream mitigation methods to evaluate over 30 MLLMs. Additionally, we propose Reasoning-Enhanced Safety Alignment (RESA), a novel alignment technique that improves overall trustworthiness without compromising model performance. Contribution/Results: MultiTrust-X establishes an empirical foundation and a new paradigm for trustworthy multimodal intelligence, enabling rigorous, standardized, and modality-aware evaluation of MLLM trustworthiness.

Technology Category

Application Category

📝 Abstract
The trustworthiness of Multimodal Large Language Models (MLLMs) remains an intense concern despite the significant progress in their capabilities. Existing evaluation and mitigation approaches often focus on narrow aspects and overlook risks introduced by the multimodality. To tackle these challenges, we propose MultiTrust-X, a comprehensive benchmark for evaluating, analyzing, and mitigating the trustworthiness issues of MLLMs. We define a three-dimensional framework, encompassing five trustworthiness aspects which include truthfulness, robustness, safety, fairness, and privacy; two novel risk types covering multimodal risks and cross-modal impacts; and various mitigation strategies from the perspectives of data, model architecture, training, and inference algorithms. Based on the taxonomy, MultiTrust-X includes 32 tasks and 28 curated datasets, enabling holistic evaluations over 30 open-source and proprietary MLLMs and in-depth analysis with 8 representative mitigation methods. Our extensive experiments reveal significant vulnerabilities in current models, including a gap between trustworthiness and general capabilities, as well as the amplification of potential risks in base LLMs by both multimodal training and inference. Moreover, our controlled analysis uncovers key limitations in existing mitigation strategies that, while some methods yield improvements in specific aspects, few effectively address overall trustworthiness, and many introduce unexpected trade-offs that compromise model utility. These findings also provide practical insights for future improvements, such as the benefits of reasoning to better balance safety and performance. Based on these insights, we introduce a Reasoning-Enhanced Safety Alignment (RESA) approach that equips the model with chain-of-thought reasoning ability to discover the underlying risks, achieving state-of-the-art results.
Problem

Research questions and friction points this paper is trying to address.

Evaluating trustworthiness vulnerabilities in multimodal large language models
Analyzing multimodal risks and cross-modal impacts on model safety
Developing mitigation strategies for truthfulness, robustness, and privacy issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

MultiTrust-X benchmark for holistic MLLM trust evaluation
Three-dimensional framework covering five trust aspects
Reasoning-Enhanced Safety Alignment with chain-of-thought
🔎 Similar Papers
No similar papers found.
Y
Yichi Zhang
Department of Computer Science and Technology, College of AI, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China
Yao Huang
Yao Huang
Institute of Artificial Intelligence, Beihang University
Trustworthy MLMultimodal Learning
Y
Yifan Wang
Department of Computer Science and Technology, College of AI, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China
Y
Yitong Sun
Institute of Artificial Intelligence, Beihang University, Beijing, 100191, China
C
Chang Liu
Department of Computer Science and Technology, College of AI, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China
Z
Zhe Zhao
RealAI, Beijing, 100085, China
Zhengwei Fang
Zhengwei Fang
Beijing Jiaotong University
Adversarial RobustnessVision Language ModelComputer VisionUncertainty
Huanran Chen
Huanran Chen
PhD student, Tsinghua SAIL
Machine Learning TheoryOptimizationAI Safety
X
Xiao Yang
Department of Computer Science and Technology, College of AI, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China
Xingxing Wei
Xingxing Wei
Professor of Artificial Intelligence, Beihang University
Computer visionAdversarial machine learning
H
Hang Su
Department of Computer Science and Technology, College of AI, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China
Yinpeng Dong
Yinpeng Dong
Tsinghua University
Machine LearningDeep LearningAI Safety
J
Jun Zhu
Department of Computer Science and Technology, College of AI, Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China