A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

πŸ“… 2026-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the vulnerability of multimodal large language models to backdoor attacks during supervised fine-tuning, where existing defenses struggle to maintain both robustness and standard performance under low poisoning rates. The authors propose a unified defense framework that jointly enhances image patch-level representations and regularizes cross-view outputs, thereby suppressing anomalous model responses to trigger patterns at both feature and output distribution levels. Leveraging the invariance of backdoors to non-semantic perturbations, the method imposes targeted constraints while incorporating output entropy control to prevent over-suppression of legitimate generation capabilities. Extensive experiments across three models, two task types, and six attack configurations demonstrate that the approach substantially reduces attack success rates without compromising text generation quality.
πŸ“ Abstract
Multimodal large language models have become an important infrastructure for unified processing of visual and linguistic tasks. However, such models are highly susceptible to backdoor implantation during supervised fine-tuning and will steadily output the attacker's predefined harmful responses once a specific trigger pattern is activated. The core challenge of backdoor defense lies in suppressing attack success under low poisoning ratios while preserving the model's normal generation ability. These two objectives are inherently conflicting. Strong suppression often degrades benign performance, whereas weak regularization fails to mitigate backdoor behaviors. To this end, we propose a unified defense framework based on patch augmentation and cross-view regularity, which simultaneously constrains the model's anomalous behaviors in response to triggered patterns from both the feature representation and output distribution levels. Specifically, patch-level data augmentation is combined with cross-view output difference regularization to exploit the fact that backdoor responses are abnormally invariant to non-semantic perturbations and to proactively pull apart the output distributions of the original and perturbed views, thereby significantly suppressing the success rate of backdoor triggering. At the same time, we avoid over-suppression of the model during defense by imposing output entropy constraints, ensuring the quality of normal command generation. Experimental results across three models, two tasks, and six attacks show that our proposed defense method effectively reduces the attack success rate while maintaining a high level of normal text generation capability. Our work enables the secure, controlled deployment of large-scale multimodal models in realistic low-frequency poisoning and covert triggering scenarios.
Problem

Research questions and friction points this paper is trying to address.

backdoor defense
multimodal large language models
low poisoning ratio
attack success rate
normal generation ability
Innovation

Methods, ideas, or system contributions that make the work stand out.

patch-based augmentation
cross-view regularization
backdoor defense
multimodal large language models
output entropy constraint
πŸ”Ž Similar Papers
No similar papers found.
T
Tianmeng Fang
School of Computing and Information Systems, Singapore Management University, Singapore, 178902, Singapore
Yong Wang
Yong Wang
Professor of Computer Science, Ocean University of China
Software EngineeringOperational ResearchMachine Learning
Z
Zetai Kong
Faculty of Arts, The University of Melbourne, Melbourne, Carlton VIC 3053, Australia
Z
Zengzhen Su
School of Big Data and Statistics, Anhui University, Hefei, 230601, Anhui Province, PR China
J
Jun Wang
Department of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, 710049, Shaanxi Province, PR China
C
Chengjin Yu
School of Big Data and Statistics, Anhui University, Hefei, 230601, Anhui Province, PR China
Wei Wang
Wei Wang
Sun Yat-sen University
mathematical logic