🤖 AI Summary
This work addresses the vulnerability of Machine Learning as a Service (MLaaS) to model extraction attacks by proposing the first theoretically provable framework for model ownership verification. The approach leverages mutual information to quantify similarity between deep neural networks and integrates statistical hypothesis testing to establish a practical verification threshold, enabling reliable identification of extracted models. Unlike existing methods, this framework provides rigorous theoretical guarantees for detecting unauthorized model replicas. Extensive experiments across multiple benchmark datasets and tasks demonstrate state-of-the-art performance, confirming both the effectiveness and practicality of the proposed method. The implementation has been made publicly available to facilitate reproducibility and further research.
📝 Abstract
Machine Learning as a Service (MLaaS) has emerged as a widely adopted paradigm for providing access to deep neural network (DNN) models, enabling users to conveniently leverage these models through standardized APIs. However, such services are highly vulnerable to Model Extraction Attacks (MEAs), where an adversary repeatedly queries a target model to collect input-output pairs and uses them to train a surrogate model that closely replicates its functionality. While numerous defense strategies have been proposed, verifying the ownership of a suspicious model with strict theoretical guarantees remains a challenging task. To address this gap, we introduce CREDIT, a certified ownership verification against MEAs. Specifically, we employ mutual information to quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold. We extensively evaluate our approach on several mainstream datasets across different domains and tasks, achieving state-of-the-art performance. Our implementation is publicly available at: https://github.com/LabRAI/CREDIT.