Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Detecting backdoors in multimodal large language models (MLLMs) under semi-honest outsourced training—across diverse learning paradigms (supervised, semi-supervised, autoregressive)—remains an open challenge. Method: We propose the first unified, architecture-agnostic, and paradigm-agnostic backdoor detection framework, integrating Centered Kernel Alignment (CKA), cross-validation via dual-service-provider model inconsistency, and backdoor fine-tuning sensitivity analysis. Contribution/Results: Our approach achieves the first effective identification of MLLM backdoor triggers. Compared to existing statistical methods, it significantly improves robustness and generalizability: detection accuracy increases by 5.4%, 1.6%, and 11.9% across three benchmark tasks, while false positive rates drop substantially. The framework is lightweight, deployment-ready, and highly compatible—offering resource-constrained organizations a practical, scalable defense against backdoor attacks in outsourced model training.

Technology Category

Application Category

📝 Abstract

Institutions with limited data and computing resources often outsource model training to third-party providers in a semi-honest setting, assuming adherence to prescribed training protocols with pre-defined learning paradigm (e.g., supervised or semi-supervised learning). However, this practice can introduce severe security risks, as adversaries may poison the training data to embed backdoors into the resulting model. Existing detection approaches predominantly rely on statistical analyses, which often fail to maintain universally accurate detection accuracy across different learning paradigms. To address this challenge, we propose a unified backdoor detection framework in the semi-honest setting that exploits cross-examination of model inconsistencies between two independent service providers. Specifically, we integrate central kernel alignment to enable robust feature similarity measurements across different model architectures and learning paradigms, thereby facilitating precise recovery and identification of backdoor triggers. We further introduce backdoor fine-tuning sensitivity analysis to distinguish backdoor triggers from adversarial perturbations, substantially reducing false positives. Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines across supervised, semi-supervised, and autoregressive learning tasks, respectively. Notably, it is the first to effectively detect backdoors in multimodal large language models, further highlighting its broad applicability and advancing secure deep learning.

Problem

Research questions and friction points this paper is trying to address.

Detect backdoors in models trained by semi-honest third-party providers

Unify detection across different learning paradigms and architectures

Distinguish backdoor triggers from adversarial perturbations accurately

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-examination framework for unified backdoor detection

Central kernel alignment for robust feature similarity

Fine-tuning sensitivity analysis to reduce false positives

🔎 Similar Papers

CEPA: Consensus Embedded Perturbation for Agnostic Detection and Inversion of Backdoors