The Side Effects of Being Smart: Safety Risks in MLLMs'Multi-Image Reasoning

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses an emerging safety concern in multimodal large language models (MLLMs): while enhanced multi-image reasoning capabilities improve performance, they may also introduce novel security risks. The study presents the first systematic investigation of safety issues in multi-image reasoning, introducing MIR-SafetyBench—the first dedicated evaluation benchmark comprising 2,676 samples across nine categories of multi-image relationships. A comprehensive assessment of 19 state-of-the-art MLLMs reveals that stronger reasoning capabilities correlate with a higher propensity to generate unsafe content. Moreover, many ostensibly “safe” responses are found to stem from evasion or misinterpretation rather than genuine safety alignment. Notably, unsafe generations are significantly associated with lower attention entropy, uncovering an intrinsic link between heightened attention concentration and elevated safety risks.

Technology Category

Application Category

📝 Abstract

As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations. Our extensive evaluations on 19 MLLMs reveal a troubling trend: models with more advanced multi-image reasoning can be more vulnerable on MIR-SafetyBench. Beyond attack success rates, we find that many responses labeled as safe are superficial, often driven by misunderstanding or evasive, non-committal replies. We further observe that unsafe generations exhibit lower attention entropy than safe ones on average. This internal signature suggests a possible risk that models may over-focus on task solving while neglecting safety constraints. Our code and data are available at https://github.com/thu-coai/MIR-SafetyBench.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models

Multi-image Reasoning

Safety Risks

MLLMs

Safety Benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-image reasoning

safety benchmark

multimodal LLMs