Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Feature-level explanations in eXplainable AI (XAI) are vulnerable to attribute inference attacks, risking leakage of sensitive personal information, yet effective defenses remain scarce. This paper presents the first systematic evaluation of three privacy-enhancing technologies—differentially private training, synthetic data generation, and noise injection—in the context of feature-based XAI methods (e.g., LIME, SHAP), and proposes an integrated framework that jointly optimizes privacy preservation and explanation utility. Experimental results demonstrate that the proposed approach reduces attribute inference attack success rates by up to 49.47% under optimal configuration, while preserving model prediction accuracy and explanation fidelity nearly unchanged. The core contributions are: (i) establishing a quantitative assessment paradigm for XAI-specific privacy risks, and (ii) empirically validating the effectiveness and feasibility of embedding privacy-enhancing technologies directly into the explanation generation phase.

Technology Category

Application Category

📝 Abstract
Explainable Artificial Intelligence (XAI) is a crucial pathway in mitigating the risk of non-transparency in the decision-making process of black-box Artificial Intelligence (AI) systems. However, despite the benefits, XAI methods are found to leak the privacy of individuals whose data is used in training or querying the models. Researchers have demonstrated privacy attacks that exploit explanations to infer sensitive personal information of individuals. Currently there is a lack of defenses against known privacy attacks targeting explanations when vulnerable XAI are used in production and machine learning as a service system. To address this gap, in this article, we explore Privacy Enhancing Technologies (PETs) as a defense mechanism against attribute inference on explanations provided by feature-based XAI methods. We empirically evaluate 3 types of PETs, namely synthetic training data, differentially private training and noise addition, on two categories of feature-based XAI. Our evaluation determines different responses from the mitigation methods and side-effects of PETs on other system properties such as utility and performance. In the best case, PETs integration in explanations reduced the risk of the attack by 49.47%, while maintaining model utility and explanation quality. Through our evaluation, we identify strategies for using PETs in XAI for maximizing benefits and minimizing the success of this privacy attack on sensitive personal information.
Problem

Research questions and friction points this paper is trying to address.

XAI methods leak privacy of individuals' training data
Lack defenses against privacy attacks on XAI explanations
Explore PETs to mitigate attribute inference attacks in XAI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrate Privacy Enhancing Technologies in XAI
Evaluate synthetic data, differential privacy, noise
Reduce attack risk by 49.47% effectively
S
Sonal Allana
School of Computer Science, University of Guelph, 474 Gordon St., Guelph, N1G 1Y4, Ontario, Canada
Rozita Dara
Rozita Dara
Associate Professor, University of Guelph
Xiaodong Lin
Xiaodong Lin
Professor, IEEE Fellow, University of Guelph, Canada
information securityprivacydigital forensicswireless network securityapplied cryptography
P
Pulei Xiong
School of Computer Science, University of Guelph, 474 Gordon St., Guelph, N1G 1Y4, Ontario, Canada