🤖 AI Summary
Black-box query attacks pose a severe threat to the security of Machine Learning as a Service (MLaaS) systems, while existing defenses often incur high computational overhead or degrade accuracy on clean samples. This paper proposes a lightweight Randomized Block-wise Implicit Purification (RBIP) defense mechanism that reconstructs the natural image manifold without requiring knowledge of the target model’s architecture or gradients—leveraging local implicit function modeling and randomized input block purification. We theoretically prove that RBIP slows down adversarial convergence. The method employs an ensemble of lightweight purification models to jointly enhance robustness and generalization. Evaluated on CIFAR-10 and ImageNet, RBIP significantly improves resilience against prominent black-box query attacks—including Bandits, Square, and SimBA—while incurring minimal purification overhead and preserving clean-sample accuracy.
📝 Abstract
Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.