On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model Extraction

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In ML-as-a-Service (MLaaS) settings, counterfactual explanations (CFs) inadvertently amplify model extraction attacks (MEAs), posing severe privacy risks. Method: We systematically characterize this CF-induced amplification effect and propose a two-stage differential privacy (DP) defense—injecting calibrated noise during both model training and CF generation. Contribution/Results: We introduce the first tri-dimensional evaluation framework jointly quantifying interpretability, privacy, and predictive performance. Empirical results demonstrate that applying DP exclusively during CF generation—rather than model training—achieves superior trade-offs: it incurs only a 1.2% drop in classification accuracy while reducing MEA success rate by over 68%. This significantly improves the balance between privacy protection and utility. Our work provides both theoretical foundations and practical mechanisms for securely deploying interpretable AI in privacy-sensitive services.

Technology Category

Application Category

📝 Abstract
Machine Learning as a Service (MLaaS) has gained important attraction as a means for deploying powerful predictive models, offering ease of use that enables organizations to leverage advanced analytics without substantial investments in specialized infrastructure or expertise. However, MLaaS platforms must be safeguarded against security and privacy attacks, such as model extraction (MEA) attacks. The increasing integration of explainable AI (XAI) within MLaaS has introduced an additional privacy challenge, as attackers can exploit model explanations particularly counterfactual explanations (CFs) to facilitate MEA. In this paper, we investigate the trade offs among model performance, privacy, and explainability when employing Differential Privacy (DP), a promising technique for mitigating CF facilitated MEA. We evaluate two distinct DP strategies: implemented during the classification model training and at the explainer during CF generation.
Problem

Research questions and friction points this paper is trying to address.

Investigates trade-offs between model performance, privacy, and explainability
Examines Differential Privacy's role in mitigating explanation-assisted model extraction
Evaluates DP strategies during model training and counterfactual explanation generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Differential Privacy for model protection
Evaluating DP in classification training phase
Assessing DP during counterfactual explanation generation
🔎 Similar Papers
No similar papers found.