SHAP-based Explanations are Sensitive to Feature Representation

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper identifies a systemic vulnerability in locally interpretable AI methods (e.g., SHAP) at the feature representation level: standard data engineering operations—such as age binning or categorical encoding of sensitive attributes—can substantially distort feature importance rankings and even be exploited adversarially to obscure model discrimination. Method: We conduct a systematic sensitivity analysis, construct adversarial representations, and empirically evaluate how representation choices affect SHAP explanations across real-world datasets and fairness-critical models. Contribution/Results: We demonstrate, for the first time, that conventional preprocessing alone suffices to mislead mainstream explainers—challenging the prevailing paradigm that confines adversarial attacks to the model or raw-data layers—and introduce the novel concept of “representation-layer vulnerability.” Our experiments show that alternative encodings can fully invert SHAP importance orderings. We further propose actionable detection heuristics and mitigation guidelines to strengthen the reliability and robustness of XAI in fairness auditing.

Technology Category

Application Category

📝 Abstract

Local feature-based explanations are a key component of the XAI toolkit. These explanations compute feature importance values relative to an ``interpretable'' feature representation. In tabular data, feature values themselves are often considered interpretable. This paper examines the impact of data engineering choices on local feature-based explanations. We demonstrate that simple, common data engineering techniques, such as representing age with a histogram or encoding race in a specific way, can manipulate feature importance as determined by popular methods like SHAP. Notably, the sensitivity of explanations to feature representation can be exploited by adversaries to obscure issues like discrimination. While the intuition behind these results is straightforward, their systematic exploration has been lacking. Previous work has focused on adversarial attacks on feature-based explainers by biasing data or manipulating models. To the best of our knowledge, this is the first study demonstrating that explainers can be misled by standard, seemingly innocuous data engineering techniques.

Problem

Research questions and friction points this paper is trying to address.

Impact of data engineering on feature-based explanations

Manipulation of SHAP values by common data techniques

Adversarial exploitation of representation-sensitive explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

SHAP explanations vary with feature representation

Data engineering affects feature importance values

Adversaries can exploit representation sensitivity

🔎 Similar Papers

Improving the Weighting Strategy in KernelSHAP