🤖 AI Summary
This paper addresses model stealing attacks, where malicious users replicate commercial API-hosted models via low-cost queries. We propose D-ADD, a collaborative defense framework integrating Account-aware Distribution Difference detection (ADD) and Random Prediction Poisoning. Our method introduces a novel non-parametric, gradient-free account-level detector that models per-user local query features as multivariate normal distributions—conditioned on class labels—to quantify statistical deviations from benign behavior. Coupled with a soft/hard-label-compatible random prediction poisoning mechanism, D-ADD enables plug-and-play, real-time protection. Evaluated against diverse state-of-the-art model stealing attacks, D-ADD achieves over 92% defense success rate while degrading legitimate user accuracy by less than 0.5%, significantly outperforming existing SOTA defenses. The framework incurs minimal computational overhead, exhibits strong robustness against adaptive attackers, and preserves service transparency—requiring no modifications to the target model or user-facing interfaces.
📝 Abstract
Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.