Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses the challenges of adversarial attacks on large vision-language models (LVLMs) under black-box settings, where unavailable gradients and complex multimodal alignment hinder effective transferability. Existing approaches often suffer from high gradient variance and semantic inconsistency, limiting their cross-model generalization. To overcome these issues, the authors propose M-Attack-V2, which restructures local matching through Multi-Crop Alignment (MCA) at the source and Auxiliary Target Alignment (ATA) at the target, while incorporating Patch Momentum and a refined Patch Ensemble with adaptive block sizing (PE+) to stabilize gradient estimation and enhance semantic coherence. The method significantly improves transfer-based attack performance, achieving success rates of 30%, 97%, and 100% on Claude-4.0, Gemini-2.5-Pro, and GPT-5, respectively, substantially outperforming current black-box attack strategies.

Technology Category

Application Category

📝 Abstract

Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2.

Problem

Research questions and friction points this paper is trying to address.

black-box attack

Large Vision-Language Models

adversarial attacks

transfer-based attack

multimodal boundaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box adversarial attack

Large Vision-Language Models

Transfer-based attack