Cross-Modal Backdoors in Multimodal Large Language Models

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses a critical security blind spot in lightweight connectors of multimodal large language models by proposing a novel cross-modal backdoor attack. The method requires only the poisoning of the connector and leverages a single sample along with its augmented variants to implant a backdoor that can be triggered by arbitrary modal inputs. By constructing latent-space anchors, extracting malicious centroids, and optimizing inputs on the attacker side, the approach establishes a highly stealthy and transferable cross-modal backdoor pathway without requiring full model access or frequent API calls. Experiments demonstrate attack success rates of up to 99.9% in same-modality settings and over 95.0% across modalities on mainstream architectures such as PandaGPT and NExT-GPT, while maintaining model weight cosine similarity above 0.97, thereby effectively evading current defense mechanisms.
📝 Abstract
Developers increasingly construct multimodal large language models (MLLMs) by assembling pretrained components,introducing supply-chain attack surfaces.Existing security research primarily focuses on poisoning backbones such as encoders or large language models (LLMs),while the security risks of lightweight connectors remain unexplored.In this work,we propose a novel cross-modal backdoor attack that exploits this overlooked vulnerability.By poisoning only the connector using a single seed sample and several augmented variants from one modality,the adversary can subsequently activate the backdoor using inputs from other modalities.To achieve this,we first poison the connector to associate a compact latent region with a malicious target output.To activate the backdoor from other modalities,we further extract a malicious centroid from the poisoned latent representations and perform input-side optimization to steer inputs toward this latent anchor,without requiring repeated API queries or full-model access.Extensive evaluations on representative connector-based MLLM architectures,including PandaGPT and NExT-GPT,demonstrate both the effectiveness and cross-modal transferability of the proposed attack.The attack achieves up to 99.9% attack success rate (ASR) in same-modality settings,while most cross-modal settings exceed 95.0% ASR under bounded perturbations.Moreover,the attack remains highly stealthy,producing negligible leakage on clean inputs,and maintaining weight-cosine similarity above 0.97 relative to benign connectors.We further show that existing defense strategies fail to effectively mitigate this threat without incurring substantial utility degradation.These findings reveal a fundamental vulnerability in multimodal alignment: a single compromised connector can establish a reusable latent-space backdoor pathway across modalities,highlighting the need for safer modular MLLM design.
Problem

Research questions and friction points this paper is trying to address.

cross-modal backdoor
multimodal large language models
connector vulnerability
supply-chain attack
latent-space backdoor
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal backdoor
multimodal large language models
connector poisoning
latent-space attack
supply-chain security
🔎 Similar Papers
No similar papers found.