Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Feature-level explanations in eXplainable AI (XAI) are vulnerable to attribute inference attacks, risking leakage of sensitive personal information, yet effective defenses remain scarce. This paper presents the first systematic evaluation of three privacy-enhancing technologies—differentially private training, synthetic data generation, and noise injection—in the context of feature-based XAI methods (e.g., LIME, SHAP), and proposes an integrated framework that jointly optimizes privacy preservation and explanation utility. Experimental results demonstrate that the proposed approach reduces attribute inference attack success rates by up to 49.47% under optimal configuration, while preserving model prediction accuracy and explanation fidelity nearly unchanged. The core contributions are: (i) establishing a quantitative assessment paradigm for XAI-specific privacy risks, and (ii) empirically validating the effectiveness and feasibility of embedding privacy-enhancing technologies directly into the explanation generation phase.

Technology Category

Application Category

📝 Abstract

Explainable Artificial Intelligence (XAI) is a crucial pathway in mitigating the risk of non-transparency in the decision-making process of black-box Artificial Intelligence (AI) systems. However, despite the benefits, XAI methods are found to leak the privacy of individuals whose data is used in training or querying the models. Researchers have demonstrated privacy attacks that exploit explanations to infer sensitive personal information of individuals. Currently there is a lack of defenses against known privacy attacks targeting explanations when vulnerable XAI are used in production and machine learning as a service system. To address this gap, in this article, we explore Privacy Enhancing Technologies (PETs) as a defense mechanism against attribute inference on explanations provided by feature-based XAI methods. We empirically evaluate 3 types of PETs, namely synthetic training data, differentially private training and noise addition, on two categories of feature-based XAI. Our evaluation determines different responses from the mitigation methods and side-effects of PETs on other system properties such as utility and performance. In the best case, PETs integration in explanations reduced the risk of the attack by 49.47%, while maintaining model utility and explanation quality. Through our evaluation, we identify strategies for using PETs in XAI for maximizing benefits and minimizing the success of this privacy attack on sensitive personal information.

Problem

Research questions and friction points this paper is trying to address.

XAI methods leak privacy of individuals' training data

Lack defenses against privacy attacks on XAI explanations

Explore PETs to mitigate attribute inference attacks in XAI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrate Privacy Enhancing Technologies in XAI

Evaluate synthetic data, differential privacy, noise

Reduce attack risk by 49.47% effectively

🔎 Similar Papers

Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction