CEPA: Consensus Embedded Perturbation for Agnostic Detection and Inversion of Backdoors

📅 2024-02-03

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses backdoor attacks leveraging diverse embedding mechanisms in deep neural networks, proposing a training-data-free and prior-knowledge-agnostic universal backdoor detection and reverse-engineering method. The core problem tackled is the lack of robust, model-agnostic approaches capable of identifying triggers and localizing target classes without access to clean data or attack specifications. Our method introduces consensus-based embedding perturbation into backdoor feature space inversion, comprising three key components: (i) embedding-layer perturbation modeling, (ii) consensus-clustering-guided reverse optimization, and (iii) unsupervised target-class discrimination. This enables simultaneous trigger reconstruction and interpretable target-class identification. Evaluated on CIFAR-10 and CIFAR-100 against BadNets, Blend, SIG, and other state-of-the-art attacks, our approach achieves >96% detection accuracy—surpassing existing SOTA methods—and demonstrates strong robustness and cross-attack generalizability.

Technology Category

Application Category

📝 Abstract

A variety of defenses have been proposed against Trojans planted in (backdoor attacks on) deep neural network (DNN) classifiers. Backdoor-agnostic methods seek to reliably detect and/or to mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while inversion methods explicitly assume one. In this paper, we describe a new detector that: relies on embedded feature representations to estimate (invert) the backdoor and to identify its target class; can operate without access to the training dataset; and is highly effective for various incorporation mechanisms (i.e., is backdoor agnostic). Our detection approach is evaluated -- and found to be favorable - in comparison with an array of published defenses for a variety of different attacks on the CIFAR-10 and CIFAR-100 image-classification domains.

Problem

Research questions and friction points this paper is trying to address.

Detects and mitigates backdoors in DNN classifiers

Operates without access to training datasets

Effective across various backdoor incorporation mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses embedded feature representations for backdoor detection

Operates without requiring training dataset access

Effective across various backdoor incorporation mechanisms

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning