Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation

📅 2024-10-17
🏛️ The Web Conference
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing explainable recommendation methods predominantly rely on text similarity to evaluate generated explanations, neglecting whether explanations genuinely reflect users’ post-purchase sentiment polarity (i.e., preference vs. aversion). This misalignment undermines explanation credibility and user trust. Method: We identify this gap and introduce the first generative explanation dataset explicitly annotated for fine-grained positive/negative sentiment. We formulate “sentiment-aware explanation generation” as a new task. Our approach decouples modeling of positive and negative opinions in user reviews, incorporates predicted ratings as sentiment priors, and proposes a dual-axis evaluation metric—sentiment consistency and positive/negative opinion coverage. Contribution/Results: Experiments reveal that state-of-the-art models exhibit weak sentiment alignment. Integrating rating-based sentiment priors significantly improves sentiment accuracy. We publicly release both code and dataset to advance explainable recommendation toward trustworthiness and sentiment fidelity.

Technology Category

Application Category

📝 Abstract
Recent research on explainable recommendation generally frames the task as a standard text generation problem, and evaluates models simply based on the textual similarity between the predicted and ground-truth explanations. However, this approach fails to consider one crucial aspect of the systems: whether their outputs accurately reflect the users' (post-purchase) sentiments, i.e., whether and why they would like and/or dislike the recommended items. To shed light on this issue, we introduce new datasets and evaluation methods that focus on the users' sentiments. Specifically, we construct the datasets by explicitly extracting users' positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users' sentiments, and 2) accurately identify both positive and negative opinions of users on the target items. We benchmark several recent models on our datasets and demonstrate that achieving strong performance on existing metrics does not ensure that the generated explanations align well with the users' sentiments. Lastly, we find that existing models can provide more sentiment-aware explanations when the users' (predicted) ratings for the target items are directly fed into the models as input. The datasets and benchmark implementation are available at: https://github.com/jchanxtarov/sent_xrec.
Problem

Research questions and friction points this paper is trying to address.

Evaluating explainable recommendations based on sentiment alignment
Identifying both positive and negative user opinions accurately
Improving sentiment-aware explanations using predicted user ratings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extract user sentiments using LLM
Evaluate explanations with sentiment alignment
Incorporate predicted ratings as model input
R
Ryotaro Shimizu
University of California San Diego, Waseda University
T
Takashi Wada
ZOZO Research
Y
Yu Wang
University of California San Diego
J
Johannes Kruse
University of California San Diego
Sean O'Brien
Sean O'Brien
PhD @ UC San Diego (previously UC Berkeley, Meta AI)
natural language processingdecoding methodslarge language modelsdark matter
S
Sai Htaung Kham
ZOZO Research
L
Linxin Song
University of Southern California
Yuya Yoshikawa
Yuya Yoshikawa
STAIR Lab, Chiba Institute of Technology
Machine LearningComputer Vision
Yuki Saito
Yuki Saito
Lecturer (Sr. Assistant Professor), The University of Tokyo
Speech synthesisVoice conversionMachine learning
F
F. Tsung
The Hong Kong University of Science and Technology
M
Masayuki Goto
Waseda University
J
Julian J. McAuley
University of California San Diego