🤖 AI Summary
To address information overload in online product reviews and misalignment with users’ personalized needs, this paper proposes the first persona-aware two-stage alignment framework for controllable LLM-based review summarization. Methodologically, Stage I employs asymmetric knowledge distillation fine-tuning to enhance the model’s sensitivity to user personas (e.g., price-sensitive or quality-oriented). Stage II introduces a preference estimator–driven RLAIF (Reinforcement Learning from AI Feedback) optimization to jointly align summaries with factual accuracy, logical consistency, and fine-grained user preferences. We construct a multi-dimensional evaluation suite—incorporating rule-based metrics, LLM-as-a-judge assessments, and human evaluation—on a large-scale Amazon 2023 review dataset. Experiments demonstrate statistically significant improvements over state-of-the-art baselines across all metrics, with strong cross-category generalization capability.
📝 Abstract
Online product reviews contain rich but noisy signals that overwhelm users and hinder effective decision-making. Existing LLM-based summarizers remain generic and fail to account for individual preferences, limiting their practical utility. We propose SUMFORU, a steerable review summarization framework that aligns outputs with explicit user personas to support personalized purchase decisions. Our approach integrates a high-quality data pipeline built from the Amazon 2023 Review Dataset with a two-stage alignment procedure: (1) persona-aware Supervised Fine-Tuning (SFT) via asymmetric knowledge distillation, and (2) Reinforcement Learning with AI Feedback (RLAIF) using a preference estimator to capture fine-grained, persona-relevant signals. We evaluate the model across rule-based, LLM-based, and human-centered metrics, demonstrating consistent improvements in consistency, grounding, and preference alignment. Our framework achieves the highest performance across all evaluation settings and generalizes effectively to unseen product categories. Our results highlight the promise of steerable pluralistic alignment for building next-generation personalized decision-support systems.