Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing backdoor detectors exhibit insufficient sensitivity to target-class attacks and are prone to interference from class-intrinsic features. Method: We propose Class Subspace Orthogonalization (CSO), a novel detection framework that constructs a constrained optimization problem using a small set of clean samples; it orthogonalizes class-specific feature subspaces to suppress intrinsic class separability and explicitly decouple and amplify the statistical signal of backdoor triggers. CSO integrates decision-confidence analysis with subspace regularization to enable fine-grained detection against stealthy triggers, mixed-label attacks, and adaptive attacks. Contribution/Results: Experiments demonstrate that CSO significantly improves detection sensitivity—especially for weak triggers and scenarios where non-target classes are highly distinguishable—while reducing false positive rates by up to 32.7%. It outperforms state-of-the-art methods in robustness across diverse attack settings.

Technology Category

Application Category

📝 Abstract

Most post-training backdoor detection methods rely on attacked models exhibiting extreme outlier detection statistics for the target class of an attack, compared to non-target classes. However, these approaches may fail: (1) when some (non-target) classes are easily discriminable from all others, in which case they may naturally achieve extreme detection statistics (e.g., decision confidence); and (2) when the backdoor is subtle, i.e., with its features weak relative to intrinsic class-discriminative features. A key observation is that the backdoor target class has contributions to its detection statistic from both the backdoor trigger and from its intrinsic features, whereas non-target classes only have contributions from their intrinsic features. To achieve more sensitive detectors, we thus propose to suppress intrinsic features while optimizing the detection statistic for a given class. For non-target classes, such suppression will drastically reduce the achievable statistic, whereas for the target class the (significant) contribution from the backdoor trigger remains. In practice, we formulate a constrained optimization problem, leveraging a small set of clean examples from a given class, and optimizing the detection statistic while orthogonalizing with respect to the class's intrinsic features. We dub this plug-and-play approach Class Subspace Orthogonalization (CSO) and assess it against challenging mixed-label and adaptive attacks.

Problem

Research questions and friction points this paper is trying to address.

Enhances backdoor detection sensitivity by suppressing intrinsic class features.

Addresses failures in detecting subtle backdoors and outlier statistics in non-target classes.

Proposes Class Subspace Orthogonalization to isolate backdoor trigger contributions for accurate detection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonalizing detection statistic with class intrinsic features

Suppressing intrinsic features to enhance backdoor sensitivity

Using constrained optimization for plug-and-play detection enhancement

🔎 Similar Papers

CEPA: Consensus Embedded Perturbation for Agnostic Detection and Inversion of Backdoors