🤖 AI Summary
In AIoT intelligent environments, post-hoc XAI methods (e.g., SHAP, LIME) enhance model interpretability but inadvertently leak user-sensitive attributes, posing significant privacy risks. To address this, we propose SHAP entropy regularization—a novel training-time mechanism that explicitly penalizes low-entropy SHAP feature attribution distributions, thereby suppressing the model’s capacity to encode sensitive information in explanations. This is the first work to incorporate information entropy as a constraint on SHAP explanations, jointly optimizing explanation clarity and privacy robustness against explanation-based sensitive attribute inference attacks. We establish a privacy attack evaluation framework on a real-world smart appliance energy consumption dataset. Experiments show that our method reduces sensitive attribute inference accuracy by up to 37.6% compared to baselines, while preserving high predictive accuracy (±0.5%) and explanation fidelity (SHAP consistency >0.92), thus significantly strengthening the co-guarantee of privacy and interpretability.
📝 Abstract
The widespread integration of Artificial Intelligence of Things (AIoT) in smart home environments has amplified the demand for transparent and interpretable machine learning models. To foster user trust and comply with emerging regulatory frameworks, the Explainable AI (XAI) methods, particularly post-hoc techniques such as SHapley Additive exPlanations (SHAP), and Local Interpretable Model-Agnostic Explanations (LIME), are widely employed to elucidate model behavior. However, recent studies have shown that these explanation methods can inadvertently expose sensitive user attributes and behavioral patterns, thereby introducing new privacy risks. To address these concerns, we propose a novel privacy-preserving approach based on SHAP entropy regularization to mitigate privacy leakage in explainable AIoT applications. Our method incorporates an entropy-based regularization objective that penalizes low-entropy SHAP attribution distributions during training, promoting a more uniform spread of feature contributions. To evaluate the effectiveness of our approach, we developed a suite of SHAP-based privacy attacks that strategically leverage model explanation outputs to infer sensitive information. We validate our method through comparative evaluations using these attacks alongside utility metrics on benchmark smart home energy consumption datasets. Experimental results demonstrate that SHAP entropy regularization substantially reduces privacy leakage compared to baseline models, while maintaining high predictive accuracy and faithful explanation fidelity. This work contributes to the development of privacy-preserving explainable AI techniques for secure and trustworthy AIoT applications.