From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the vulnerability of deep neural networks to backdoor attacks and the limited robustness of existing test-time defenses, which often rely on internal mechanisms of the compromised model. To overcome this limitation, the authors propose a novel paradigm based on external semantic auditing, leveraging a general-purpose vision-language model (VLM) as an independent semantic gatekeeper for model-agnostic, online backdoor detection and defense. The proposed PRISM framework dynamically refines visual prototypes through a hybrid VLM teacher and employs an adaptive routing mechanism with statistical boundary monitoring to calibrate decision thresholds in real time. Evaluated across 17 datasets and 11 attack types, PRISM achieves state-of-the-art performance, reducing attack success rates to below 1% on CIFAR-10 while simultaneously improving clean-sample accuracy.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement&Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes online, and an Adaptive Router powered by statistical margin monitoring to calibrate gating thresholds in real-time. Extensive evaluation across 17 datasets and 11 attack types demonstrates that PRISM achieves state-of-the-art performance, suppressing Attack Success Rate to<1% on CIFAR-10 while improving clean accuracy, establishing a new standard for model-agnostic, externalized security.

Problem

Research questions and friction points this paper is trying to address.

backdoor defense

test-time security

model vulnerability

external auditing

vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models

Backdoor Defense

External Auditing