MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This paper identifies a novel security vulnerability in Large Vision-Language Models (LVLMs) under multi-target backdoor attacks—whereas prior work focuses on single-trigger, single-target scenarios, we propose the first multi-target backdoor attack framework tailored for LVLMs, enabling simultaneous implantation of multiple semantically independent visual triggers during a single training phase, each precisely steering the model toward a distinct malicious output. Method: We introduce proxy-space partitioning and trigger-prototype anchoring constraints, integrated with latent-space joint optimization, proxy-class mapping, and feature disentanglement to ensure cooperative multi-trigger training and guaranteed feature separability. Results: Extensive experiments demonstrate that our method achieves significantly higher attack success rates than baselines across mainstream benchmarks, exhibits strong cross-dataset generalization, and maintains robustness against state-of-the-art defense mechanisms.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and instruction tuning. However, the security vulnerabilities of LVLMs have become increasingly concerning, particularly their susceptibility to backdoor attacks. Existing backdoor attacks focus on single-target attacks, i.e., targeting a single malicious output associated with a specific trigger. In this work, we uncover multi-target backdoor attacks, where multiple independent triggers corresponding to different attack targets are added in a single pass of training, posing a greater threat to LVLMs in real-world applications. Executing such attacks in LVLMs is challenging since there can be many incorrect trigger-target mappings due to severe feature interference among different triggers. To address this challenge, we propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs. The core of MTAttack is a novel optimization method with two constraints, namely Proxy Space Partitioning constraint and Trigger Prototype Anchoring constraint. It jointly optimizes multiple triggers in the latent space, with each trigger independently mapping clean images to a unique proxy class while at the same time guaranteeing their separability. Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks, substantially outperforming existing attack methods. Furthermore, our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies. These findings highlight the vulnerability of LVLMs to multi-target backdoor attacks and underscore the urgent need for mitigating such threats. Code is available at https://github.com/mala-lab/MTAttack.

Problem

Research questions and friction points this paper is trying to address.

Addresses multi-target backdoor attacks in vision-language models

Solves feature interference between multiple trigger-target mappings

Proposes optimization method for accurate multiple trigger attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-target backdoor attack framework for LVLMs

Optimization method with two novel constraints

Jointly optimizes multiple triggers in latent space

🔎 Similar Papers

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image