Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This paper addresses model stealing attacks, where malicious users replicate commercial API-hosted models via low-cost queries. We propose D-ADD, a collaborative defense framework integrating Account-aware Distribution Difference detection (ADD) and Random Prediction Poisoning. Our method introduces a novel non-parametric, gradient-free account-level detector that models per-user local query features as multivariate normal distributions—conditioned on class labels—to quantify statistical deviations from benign behavior. Coupled with a soft/hard-label-compatible random prediction poisoning mechanism, D-ADD enables plug-and-play, real-time protection. Evaluated against diverse state-of-the-art model stealing attacks, D-ADD achieves over 92% defense success rate while degrading legitimate user accuracy by less than 0.5%, significantly outperforming existing SOTA defenses. The framework incurs minimal computational overhead, exhibits strong robustness against adaptive attackers, and preserves service transparency—requiring no modifications to the target model or user-facing interfaces.

Technology Category

Application Category

📝 Abstract

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

Problem

Research questions and friction points this paper is trying to address.

Detects malicious users attempting to replicate commercial models.

Measures distribution discrepancy to identify model-stealing attacks.

Combines detection with prediction poisoning for robust defense.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Account-aware Distribution Discrepancy for detection

Multivariate Normal distribution for class modeling

Random-based prediction poisoning for defense

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models