Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limitations of existing adversarial attacks on multimodal large language models (MLLMs), which often rely on proxy models trained under a single learning paradigm, resulting in constrained feature representations and poor transferability of adversarial perturbations. To overcome this, we propose MPCAttack, a multi-paradigm collaborative attack framework that jointly optimizes adversarial examples in a unified feature space by integrating visual and textual semantics. Central to our approach is the Multi-Paradigm Collaborative Optimization (MPCO) strategy, which leverages cross-modal contrastive learning to adaptively balance the weights of different paradigms, thereby mitigating representation bias and expanding the perturbation search space. Extensive experiments demonstrate that MPCAttack significantly outperforms state-of-the-art methods across multiple benchmarks, achieving stronger targeted and untargeted attack performance on both open-source and proprietary MLLMs.

Technology Category

Application Category

📝 Abstract

The rapid progress of Multi-Modal Large Language Models (MLLMs) has significantly advanced downstream applications. However, this progress also exposes serious transferable adversarial vulnerabilities. In general, existing adversarial attacks against MLLMs typically rely on surrogate models trained within a single learning paradigm and perform independent optimisation in their respective feature spaces. This straightforward setting naturally restricts the richness of feature representations, delivering limits on the search space and thus impeding the diversity of adversarial perturbations. To address this, we propose a novel Multi-Paradigm Collaborative Attack (MPCAttack) framework to boost the transferability of adversarial examples against MLLMs. In principle, MPCAttack aggregates semantic representations, from both visual images and language texts, to facilitate joint adversarial optimisation on the aggregated features through a Multi-Paradigm Collaborative Optimisation (MPCO) strategy. By performing contrastive matching on multi-paradigm features, MPCO adaptively balances the importance of different paradigm representations and guides the global perturbation optimisation, effectively alleviating the representation bias. Extensive experimental results on multiple benchmarks demonstrate the superiority of MPCAttack, indicating that our solution consistently outperforms state-of-the-art methods in both targeted and untargeted attacks on open-source and closed-source MLLMs. The code is released at https://github.com/LiYuanBoJNU/MPCAttack.

Problem

Research questions and friction points this paper is trying to address.

Multi-Modal Large Language Models

Adversarial Attack

Transferability

Feature Representation

Multi-Paradigm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Paradigm Collaborative Attack

Adversarial Transferability

Multi-Modal Large Language Models