From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of deep neural networks to backdoor attacks and the limited robustness of existing test-time defenses, which often rely on internal mechanisms of the compromised model. To overcome this limitation, the authors propose a novel paradigm based on external semantic auditing, leveraging a general-purpose vision-language model (VLM) as an independent semantic gatekeeper for model-agnostic, online backdoor detection and defense. The proposed PRISM framework dynamically refines visual prototypes through a hybrid VLM teacher and employs an adaptive routing mechanism with statistical boundary monitoring to calibrate decision thresholds in real time. Evaluated across 17 datasets and 11 attack types, PRISM achieves state-of-the-art performance, reducing attack success rates to below 1% on CIFAR-10 while simultaneously improving clean-sample accuracy.

Technology Category

Application Category

📝 Abstract
Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement&Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes online, and an Adaptive Router powered by statistical margin monitoring to calibrate gating thresholds in real-time. Extensive evaluation across 17 datasets and 11 attack types demonstrates that PRISM achieves state-of-the-art performance, suppressing Attack Success Rate to<1% on CIFAR-10 while improving clean accuracy, establishing a new standard for model-agnostic, externalized security.
Problem

Research questions and friction points this paper is trying to address.

backdoor defense
test-time security
model vulnerability
external auditing
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models
Backdoor Defense
External Auditing
Test-Time Adaptation
Model-Agnostic Security
🔎 Similar Papers
No similar papers found.
B
Binyan Xu
The Chinese University of Hong Kong, Hong Kong
F
Fan Yang
The Chinese University of Hong Kong, Hong Kong
X
Xilin Dai
Zhejiang University, China
D
Di Tang
Sun Yat-sen University, China
Kehuan Zhang
Kehuan Zhang
The Chinese University of Hong Kong
Security of Computer systemsWebMobileCloudEmbedded System