Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of performing universal, goal-directed adversarial attacks against closed-source multimodal large language models (MLLMs) in black-box settings. To this end, the authors propose MCRMO-Attack, a novel method that integrates attention-guided multi-crop aggregation, alignment-gated token routing, and a meta-learning optimization framework leveraging cross-target perturbation priors. This approach enables the first universal targeted attack applicable to arbitrary inputs without requiring model access or task-specific tuning. Experimental results demonstrate that MCRMO-Attack significantly outperforms existing universal attack baselines, improving targeted attack success rates on unseen images by 23.7% on GPT-4o and 19.9% on Gemini-2.0, respectively.

Technology Category

Application Category

📝 Abstract

Targeted adversarial attacks on closed-source multimodal large language models (MLLMs) have been increasingly explored under black-box transfer, yet prior methods are predominantly sample-specific and offer limited reusability across inputs. We instead study a more stringent setting, Universal Targeted Transferable Adversarial Attacks (UTTAA), where a single perturbation must consistently steer arbitrary inputs toward a specified target across unknown commercial MLLMs. Naively adapting existing sample-wise attacks to this universal setting faces three core difficulties: (i) target supervision becomes high-variance due to target-crop randomness, (ii) token-wise matching is unreliable because universality suppresses image-specific cues that would otherwise anchor alignment, and (iii) few-source per-target adaptation is highly initialization-sensitive, which can degrade the attainable performance. In this work, we propose MCRMO-Attack, which stabilizes supervision via Multi-Crop Aggregation with an Attention-Guided Crop, improves token-level reliability through alignability-gated Token Routing, and meta-learns a cross-target perturbation prior that yields stronger per-target solutions. Across commercial MLLMs, we boost unseen-image attack success rate by +23.7\% on GPT-4o and +19.9\% on Gemini-2.0 over the strongest universal baseline.

Problem

Research questions and friction points this paper is trying to address.

Universal Adversarial Perturbations

Targeted Attacks

Closed-Source MLLMs

Transferable Adversarial Attacks

Multimodal Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal Adversarial Perturbations

Multimodal Large Language Models

Targeted Transferable Attacks