Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the limited transferability of adversarial attacks against closed-source multimodal large language models (MLLMs) by proposing a frequency-domain regularization method that jointly optimizes feature alignment and gradient generation. The approach leverages high-pass discrete cosine transform (DCT) to emphasize high-frequency semantic cues while introducing a model-agnostic low-pass gradient regularizer to suppress proxy-specific artifacts, thereby disentangling transferable semantic signals from noise. Notably, this is the first method to apply frequency-domain modeling simultaneously at both the feature and gradient levels for adversarial transfer. Evaluated across 15 mainstream MLLMs from seven vendors, the technique significantly improves cross-model attack success rates, achieving state-of-the-art performance on GPT-5.4, Claude-Opus-4.6, and Gemini-3-flash.

📝 Abstract

Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate encoders can generalize to closed-source MLLMs. A key challenge for improving adversarial transferability is to effectively capture the intrinsic visual focus shared across different models, such that perturbations align with transferable semantic cues rather than surrogate-specific behaviors. However, existing methods suffer from spatial-domain feature redundancy and surrogate-specific gradient signals, thereby hindering cross-model transferability. In this paper, we propose FRA-Attack, which addresses both challenges from a unified frequency-domain regularization perspective. For feature alignment, a high-pass DCT objective on patch features suppresses redundant global structures and concentrates the loss on the high-frequency band that carries the MLLMs' intrinsic visual focus. For gradient optimization, we introduce Frequency-domain Gradient Regularization (FGR), a \textit{model-agnostic} low-pass regularizer that modulates the surrogate gradient using only the geometric frequency coordinate, \textit{i.e.}, no surrogate-derived statistic is involved, so that FGR is model-agnostic by construction, removing surrogate-specific high-frequency artifacts while preserving transferable low-frequency directions. Together, the two components form a unified frequency-domain treatment of transferability. Extensive experiments on $15$ flagship MLLMs across $7$ vendors show that FRA-Attack achieves superior cross-model transferability, particularly with state-of-the-art performance on GPT-5.4, Claude-Opus-4.6 and Gemini-3-flash.

Problem

Research questions and friction points this paper is trying to address.

transferable attacks

multimodal large language models

adversarial transferability

closed-source MLLMs

visual focus

Innovation

Methods, ideas, or system contributions that make the work stand out.

frequency-domain regularization

adversarial transferability

multimodal large language models