Listwise Preference Alignment Optimization for Tail Item Recommendation

๐Ÿ“… 2025-07-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing preference alignment methods for tail-item recommendation suffer from three key limitations: reliance on explicit reward modeling, inefficiency in negative sample utilization due to pairwise-only comparison, and lack of tail-specific design. To address these, we propose LPO4Rec, a listwise preference optimization framework. Our method extends the Bradleyโ€“Terry model to enable end-to-end listwise preference learning without explicit reward modeling. We theoretically derive a closed-form optimal policy, proving that the LPO loss is equivalent to maximizing an upper bound on the optimal reward. Furthermore, we introduce adaptive negative sampling and sample reweighting to enhance tail-item recall. Extensive experiments on three public benchmarks demonstrate that LPO4Rec outperforms ten state-of-the-art baselines, achieving up to 50% improvement in tail-item recommendation performance while reducing GPU memory consumption by 17.9% compared to DPO.

Technology Category

Application Category

๐Ÿ“ Abstract
Preference alignment has achieved greater success on Large Language Models (LLMs) and drawn broad interest in recommendation research. Existing preference alignment methods for recommendation either require explicit reward modeling or only support pairwise preference comparison. The former directly increases substantial computational costs, while the latter hinders training efficiency on negative samples. Moreover, no existing effort has explored preference alignment solutions for tail-item recommendation. To bridge the above gaps, we propose LPO4Rec, which extends the Bradley-Terry model from pairwise comparison to listwise comparison, to improve the efficiency of model training. Specifically, we derive a closed form optimal policy to enable more efficient and effective training without explicit reward modeling. We also present an adaptive negative sampling and reweighting strategy to prioritize tail items during optimization and enhance performance in tail-item recommendations. Besides, we theoretically prove that optimizing the listwise preference optimization (LPO) loss is equivalent to maximizing the upper bound of the optimal reward. Our experiments on three public datasets show that our method outperforms 10 baselines by a large margin, achieving up to 50% performance improvement while reducing 17.9% GPU memory usage when compared with direct preference optimization (DPO) in tail-item recommendation. Our code is available at https://github.com/Yuhanleeee/LPO4Rec.
Problem

Research questions and friction points this paper is trying to address.

Improves tail-item recommendation via listwise preference alignment
Reduces computational cost without explicit reward modeling
Enhances training efficiency with adaptive negative sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Listwise comparison extends Bradley-Terry model
Closed form optimal policy avoids reward modeling
Adaptive negative sampling prioritizes tail items
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zihao Li
Australian Artificial Intelligence Institute (AAII) and School of Compute Sicence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia
C
Chao Yang
Faculty of Information Science and Engineering, Ocean University of China, Qingdao, China
T
Tong Zhang
Australian Artificial Intelligence Institute (AAII) and School of Compute Sicence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia
Y
Yakun Chen
Australian Artificial Intelligence Institute (AAII) and School of Compute Sicence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia
Xianzhi Wang
Xianzhi Wang
University of Technology Sydney
Internet of ThingsData FusionMachine LearningRecommender Systems
G
Guandong Xu
Education University of Hong Kong, Hong Kong, China
Daoyi Dong
Daoyi Dong
IEEE Fellow, Professor at University of Technology Sydney/Australian National University, Australia
quantum controlcontrol and optimisationsystems engineeringmachine learningrenewable energy