🤖 AI Summary
Deep learning models in recommender systems achieve high predictive performance but suffer from poor interpretability. To address this, we propose a model-agnostic post-hoc explanation framework grounded in deletion diagnostics: by systematically ablating user–item interactions and quantifying their impact on recommendation accuracy, the method identifies critical contributors to model predictions. This is the first work to systematically adapt deletion diagnostics—a well-established statistical technique—to recommender systems, enabling broad applicability across diverse architectures (e.g., NCF, SVD) without requiring model retraining or architectural modification. Extensive experiments on MovieLens and Amazon Reviews demonstrate that our approach effectively pinpoints high-impact users and items, uncovers distinct behavioral patterns across recommendation paradigms (e.g., collaborative filtering vs. content-aware models), and provides actionable insights into model decision-making. The framework establishes a new paradigm for interpretable recommendation, balancing fidelity, generality, and practicality.
📝 Abstract
Recommender systems often benefit from complex feature embeddings and deep learning algorithms, which deliver sophisticated recommendations that enhance user experience, engagement, and revenue. However, these methods frequently reduce the interpretability and transparency of the system. In this research, we develop a systematic application, adaptation, and evaluation of deletion diagnostics in the recommender setting. The method compares the performance of a model to that of a similar model trained without a specific user or item, allowing us to quantify how that observation influences the recommender, either positively or negatively. To demonstrate its model-agnostic nature, the proposal is applied to both Neural Collaborative Filtering (NCF), a widely used deep learning-based recommender, and Singular Value Decomposition (SVD), a classical collaborative filtering technique. Experiments on the MovieLens and Amazon Reviews datasets provide insights into model behavior and highlight the generality of the approach across different recommendation paradigms.