Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work reveals that counterfactual explanations (CFs), while enhancing transparency in Machine Learning as a Service (MLaaS), inadvertently expand the attack surface for membership inference attacks, thereby exacerbating privacy risks. Specifically, the study demonstrates for the first time that CFs obtained via API queries significantly strengthen shadow-model-based membership inference attacks. To mitigate this vulnerability, the authors propose a unified defense framework integrating differential privacy with active learning. This approach effectively curbs privacy leakage while preserving both model utility and explanation quality. Experimental results show that the proposed method achieves a favorable trade-off among privacy protection, model performance, and interpretability, offering a novel pathway toward secure and trustworthy MLaaS deployments.

Technology Category

Application Category

📝 Abstract

Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vulnerable to privacy attacks such as membership inference and model extraction, and the impact of explanations on this threat landscape remains insufficiently understood. In this work, we focus on the problem of how CFs expand the attack surface of MLaaS by strengthening membership inference attacks (MIAs), and on the need to design defense mechanisms that mitigate this emerging risk without undermining utility and explainability. First, we systematically analyze how exposing CFs through query-based APIs enables more effective shadow-based MIAs. Second, we propose a defense framework that integrates Differential Privacy (DP) with Active Learning (AL) to jointly reduce memorization and limit effective training data exposure. Finally, we conduct an extensive empirical evaluation to characterize the three-way trade-off between privacy leakage, predictive performance, and explanation quality. Our findings highlight the need to carefully balance transparency, utility, and privacy in the responsible deployment of explainable MLaaS systems.

Problem

Research questions and friction points this paper is trying to address.

Counterfactual explanations

Membership Inference Attacks

Machine Learning as a Service

Privacy Leakage

Explainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual Explanations

Membership Inference Attack

Differential Privacy