PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks

📅 2024-01-19

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Black-box query attacks pose a severe threat to the security of Machine Learning as a Service (MLaaS) systems, while existing defenses often incur high computational overhead or degrade accuracy on clean samples. This paper proposes a lightweight Randomized Block-wise Implicit Purification (RBIP) defense mechanism that reconstructs the natural image manifold without requiring knowledge of the target model’s architecture or gradients—leveraging local implicit function modeling and randomized input block purification. We theoretically prove that RBIP slows down adversarial convergence. The method employs an ensemble of lightweight purification models to jointly enhance robustness and generalization. Evaluated on CIFAR-10 and ImageNet, RBIP significantly improves resilience against prominent black-box query attacks—including Bandits, Square, and SimBA—while incurring minimal purification overhead and preserving clean-sample accuracy.

Technology Category

Application Category

📝 Abstract

Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.

Problem

Research questions and friction points this paper is trying to address.

Defends MLaaS against black-box query attacks

Reduces computational cost of adversarial defenses

Maintains accuracy on non-adversarial inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random patch-wise purifications for defense

Lightweight ensemble models reduce inference cost

Local implicit function rebuilds image manifold

🔎 Similar Papers

Improving the Robustness of Object Detection and Classification AI models against Adversarial Patch Attacks