I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Model stealing attacks pose a severe threat to the intellectual property of image classification models in Machine Learning as a Service (MLaaS), yet existing methods lack a unified threat model and standardized evaluation criteria, hindering comparative analysis and reproducibility. To address this, we propose the first standardized threat model and benchmarking framework specifically for image classification model stealing. Our methodology introduces a generalizable evaluation paradigm that systematically categorizes attack strategies, data distribution assumptions, and model coverage. Innovatively integrating query-response analysis, architecture reverse-engineering, and multi-granularity performance assessment, we establish a fully reproducible experimental pipeline—including best practices—for pre-, in-, and post-attack phases. We further identify critical research gaps and release an open-problem checklist. This work advances the standardization of model stealing evaluation and provides a transferable framework for other modalities and tasks.

Technology Category

Application Category

📝 Abstract

Model stealing attacks endanger the confidentiality of machine learning models offered as a service. Although these models are kept secret, a malicious party can query a model to label data samples and train their own substitute model, violating intellectual property. While novel attacks in the field are continually being published, their design and evaluations are not standardised, making it challenging to compare prior works and assess progress in the field. This paper is the first to address this gap by providing recommendations for designing and evaluating model stealing attacks. To this end, we study the largest group of attacks that rely on training a substitute model -- those attacking image classification models. We propose the first comprehensive threat model and develop a framework for attack comparison. Further, we analyse attack setups from related works to understand which tasks and models have been studied the most. Based on our findings, we present best practices for attack development before, during, and beyond experiments and derive an extensive list of open research questions regarding the evaluation of model stealing attacks. Our findings and recommendations also transfer to other problem domains, hence establishing the first generic evaluation methodology for model stealing attacks.

Problem

Research questions and friction points this paper is trying to address.

Addressing lack of standardization in model stealing attack evaluations

Providing recommendations for designing image classification model attacks

Establishing first generic evaluation methodology for stolen models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes comprehensive threat model for attacks

Develops framework for comparing model stealing methods

Establishes generic evaluation methodology across domains

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models