🤖 AI Summary
Existing recommender systems suffer from semantic inconsistency between natural language explanations and predicted ratings, yet current evaluation paradigms overlook this critical quality dimension, undermining explanation credibility and practical utility.
Method: We first systematically identify the “explanation–prediction coherence” deficiency and propose a differentiable, automated coherence metric. We further design an end-to-end Transformer-based joint modeling framework that co-optimizes explanation generation and rating prediction.
Contribution/Results: Human evaluation and automatic metrics confirm that our approach significantly improves coherence (p < 0.01) without degrading core recommendation performance—maintaining or improving accuracy, diversity, and other key metrics. This work establishes a novel evaluation standard for explainable recommendation and introduces a scalable, unified modeling paradigm that jointly optimizes predictive and explanatory objectives.
📝 Abstract
Providing natural language explanations for recommendations is particularly useful from the perspective of a non-expert user. Although several methods for providing such explanations have recently been proposed, we argue that an important aspect of explanation quality has been overlooked in their experimental evaluation. Specifically, the coherence between generated text and predicted rating, which is a necessary condition for an explanation to be useful, is not properly captured by currently used evaluation measures. In this paper, we highlight the issue of explanation and prediction coherence by 1) presenting results from a manual verification of explanations generated by one of the state-of-the-art approaches 2) proposing a method of automatic coherence evaluation 3) introducing a new transformer-based method that aims to produce more coherent explanations than the state-of-the-art approaches 4) performing an experimental evaluation which demonstrates that this method significantly improves the explanation coherence without affecting the other aspects of recommendation performance.