🤖 AI Summary
Large language models (LLMs) applied to sequential recommendation inherit natural language generation decoding strategies, leading to “objective mismatch”—a misalignment between the text-generation objective and the next-item prediction goal.
Method: We propose a novel decoding framework that jointly leverages semantic clustering and uncertainty-aware adaptation. Specifically, we introduce logit-vector semantic clustering to group semantically equivalent items and integrate entropy-driven uncertainty estimation to adaptively redistribute probability mass and dynamically modulate sampling temperature.
Contribution/Results: Evaluated on six Amazon datasets, our method achieves 18.5%, 11.9%, and 10.8% improvements in HR@3, NDCG@3, and MRR@3, respectively. Cross-domain experiments on H&M and Netflix further demonstrate strong generalization. This work establishes a task-aligned decoding paradigm for LLM-based sequential recommendation, bridging the gap between language modeling and recommendation objectives.
📝 Abstract
Large language models have been widely applied to sequential recommendation tasks, yet during inference, they continue to rely on decoding strategies developed for natural language processing. This creates a mismatch between text-generation objectives and recommendation next item selection objectives. This paper addresses this limitation by proposing an Uncertainty-aware Semantic Decoding (USD) framework that combines logit-based clustering with adaptive scoring to improve next-item predictions. Our approach clusters items with similar logit vectors into semantic equivalence groups, then redistributes probability mass within these clusters and computes entropy across them to control item scoring and sampling temperature during recommendation inference. Experiments on Amazon Product datasets (six domains) gains of 18.5% in HR@3, 11.9% in NDCG@3, and 10.8% in MRR@3 compared to state-of-the-art baselines. Hyperparameter analysis confirms the optimal parameters among various settings, and experiments on H&M, and Netflix datasets indicate that the framework can adapt to differing recommendation domains. The experimental results confirm that integrating semantic clustering and uncertainty assessment yields more reliable and accurate recommendations.