Adversarial Attacks Against MLLMs via Progressive Resolution Processing and Adaptive Feature Alignment

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
Existing transferable targeted attacks exhibit limited transferability and robustness against multimodal large language models (MLLMs), hindering effective evaluation of black-box security risks. To address this, this work proposes the PRAF-Attack framework, which innovatively integrates multi-scale global semantic guidance with intermediate-layer local feature alignment. The approach leverages gradient consistency-driven adaptive selection of intermediate layers, patch-level optimization, and a coarse-to-fine progressive resolution strategy to substantially enhance attack transferability. Extensive experiments demonstrate that PRAF-Attack significantly outperforms seven state-of-the-art baselines across six open-source and six commercial closed-source MLLMs, achieving superior black-box attack performance.
📝 Abstract
Adversarial perturbations can mislead Multimodal Large Language Models (MLLMs) recognize a benign image as a specific target object, posing serious risks in safety-critical scenarios such as autonomous driving and medical diagnosis. This makes transfer-based targeted attacks crucial for understanding and improving black-box MLLM robustness. Existing transfer-based targeted attack methods typically rely on the final global features of the surrogate encoder and anchor optimization to original-resolution target crops, leading to their limited transferability and robustness. To address these challenges, we propose Progressive Resolution Processing and Adaptive Feature Alignment (PRAF-Attack), a targeted transfer-based attack framework that integrates multi-scale global semantic guidance with robust intermediate-layer local alignment. Unlike prior methods that align only the surrogate encoder's final layer, we design an adaptive feature alignment strategy that leverages intermediate representations to enhance transferability. Specifically, we introduce an adaptive intermediate layer selection mechanism to identify transferable hierarchical features across surrogate ensembles via gradient consistency, along with an adaptive patch-level optimization strategy that preserves highly correlated local regions through efficient patch filtering. To overcome the reliance on fixed original-resolution target crops, we propose a progressive resolution processing strategy that gradually refines optimization from coarse to fine, enabling the attack to better exploit target information at multiple scales and achieve stronger transferability. We evaluate PRAF-Attack on a diverse suite of black-box MLLMs, including six open-source models and six closed-source commercial APIs. Compared with seven state-of-the-art targeted attack baselines, the proposed PRAF-Attack consistently achieves superior transferability.
Problem

Research questions and friction points this paper is trying to address.

Adversarial Attacks
Multimodal Large Language Models
Transferability
Targeted Attacks
Black-box Robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Resolution Processing
Adaptive Feature Alignment
Transfer-based Attack
Multimodal Large Language Models
Intermediate-layer Alignment