Attackers Can Do Better: Over- and Understated Factors of Model Stealing Attacks

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the unclear attacker capability impact mechanisms and inaccurate threat assessments in model stealing attacks. We systematically disentangle the true contributions of attacker knowledge—including query budget, surrogate data quality, and architectural priors—to substitute model fidelity via controlled-variable attribution analysis and ablation studies across tasks, architectures, and data scales. Key findings: (1) stronger target models are paradoxically more vulnerable to high-fidelity stealing; (2) task complexity, rather than model complexity, better characterizes stealing difficulty; and (3) data quality and architecture compatibility vastly outweigh naive increases in query volume or parameter count. Building on these insights, we propose a data-free distillation strategy that achieves superior attack performance under extremely low query budgets—outperforming state-of-the-art attacks—and reveals that current defenses severely underestimate real-world threats.

Technology Category

Application Category

📝 Abstract

Machine learning models were shown to be vulnerable to model stealing attacks, which lead to intellectual property infringement. Among other methods, substitute model training is an all-encompassing attack applicable to any machine learning model whose behaviour can be approximated from input-output queries. Whereas prior works mainly focused on improving the performance of substitute models by, e.g. developing a new substitute training method, there have been only limited ablation studies on the impact the attacker's strength has on the substitute model's performance. As a result, different authors came to diverse, sometimes contradicting, conclusions. In this work, we exhaustively examine the ambivalent influence of different factors resulting from varying the attacker's capabilities and knowledge on a substitute training attack. Our findings suggest that some of the factors that have been considered important in the past are, in fact, not that influential; instead, we discover new correlations between attack conditions and success rate. In particular, we demonstrate that better-performing target models enable higher-fidelity attacks and explain the intuition behind this phenomenon. Further, we propose to shift the focus from the complexity of target models toward the complexity of their learning tasks. Therefore, for the substitute model, rather than aiming for a higher architecture complexity, we suggest focusing on getting data of higher complexity and an appropriate architecture. Finally, we demonstrate that even in the most limited data-free scenario, there is no need to overcompensate weak knowledge with millions of queries. Our results often exceed or match the performance of previous attacks that assume a stronger attacker, suggesting that these stronger attacks are likely endangering a model owner's intellectual property to a significantly higher degree than shown until now.

Problem

Research questions and friction points this paper is trying to address.

Examines impact of attacker's capabilities on model stealing attacks.

Identifies new correlations between attack conditions and success rate.

Proposes focusing on task complexity over model complexity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Examines attacker capabilities' impact on model theft

Proposes focusing on task complexity over model complexity

Shows high-fidelity attacks with fewer queries possible

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models