🤖 AI Summary
Existing interpretable recommendation systems exhibit poor explanation robustness against noise and white-box attacks, weak cross-dataset generalizability, and vulnerability to adversarial manipulation—posing serious risks in high-stakes decision-making. To address these limitations, we propose the first model-agnostic, feature-oriented robust interpretability framework for recommendation. Our approach integrates feature-aware explanation regularization, adversarial robust training, and a plug-and-play interface design to ensure both stability and universality of global interpretability under attacks. The framework is compatible with diverse feature-driven interpretability algorithms. Extensive experiments on three e-commerce datasets demonstrate substantial improvements in explanation quality for two state-of-the-art recommendation models. Under noisy and adversarial conditions, our method achieves an average 32.7% improvement in global explanation stability, significantly enhancing reliability and trustworthiness of model interpretations.
📝 Abstract
Explainable Recommender Systems is an important field of study which provides reasons behind the suggested recommendations. Explanations with recommender systems are useful for developers while debugging anomalies within the system and for consumers while interpreting the model's effectiveness in capturing their true preferences towards items. However, most of the existing state-of-the-art (SOTA) explainable recommenders could not retain their explanation capability under noisy circumstances and moreover are not generalizable across different datasets. The robustness of the explanations must be ensured so that certain malicious attackers do not manipulate any high-stake decision scenarios to their advantage, which could cause severe consequences affecting large groups of interest. In this work, we present a general framework for feature-aware explainable recommenders that can withstand external attacks and provide robust and generalized explanations. This paper presents a novel framework which could be utilized as an additional defense tool, preserving the global explainability when subject to model-based white box attacks. Our framework is simple to implement and supports different methods regardless of the internal model structure and intrinsic utility within any model. We experimented our framework on two architecturally different feature-based SOTA explainable algorithms by training them on three popular e-commerce datasets of increasing scales. We noticed that both the algorithms displayed an overall improvement in the quality and robustness of the global explainability under normal as well as noisy environments across all the datasets, indicating the flexibility and mutability of our framework.