RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing backdoor defense methods under class-imbalanced scenarios and the absence of reliable poisoned sample detection mechanisms. To this end, we propose the Randomized Probability Perturbation (RPP) framework, which enables certifiably robust detection of backdoored samples in a black-box setting using only model output probabilities. RPP provides provable guarantees on in-distribution detectability and an upper bound on the false positive rate. Notably, it is the first method to reveal how data imbalance exacerbates model vulnerability to backdoor attacks and the first certified detection approach tailored for imbalanced data with theoretical guarantees. Extensive experiments across five benchmark datasets, ten attack variants, and twelve baseline methods demonstrate that RPP substantially improves detection accuracy in imbalanced settings.

Technology Category

Application Category

📝 Abstract

Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios that can amplify backdoor threats. This paper presents the first in-depth investigation of how the dataset imbalance amplifies backdoor vulnerability, showing that (i) the imbalance induces a majority-class bias that increases susceptibility and (ii) conventional defenses degrade significantly as the imbalance grows. To address this, we propose Randomized Probability Perturbation (RPP), a certified poisoned-sample detection framework that operates in a black-box setting using only model output probabilities. For any inspected sample, RPP determines whether the input has been backdoor-manipulated, while offering provable within-domain detectability guarantees and a probabilistic upper bound on the false positive rate. Extensive experiments on five benchmarks (MNIST, SVHN, CIFAR-10, TinyImageNet and ImageNet10) covering 10 backdoor attacks and 12 baseline defenses show that RPP achieves significantly higher detection accuracy than state-of-the-art defenses, particularly under dataset imbalance. RPP establishes a theoretical and practical foundation for defending against backdoor attacks in real-world environments with imbalanced data.

Problem

Research questions and friction points this paper is trying to address.

backdoor attacks

dataset imbalance

poisoned-sample detection

class imbalance

deep neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor attack

dataset imbalance

certified detection