🤖 AI Summary
This work addresses a critical limitation in existing generative recommender systems: the neglect of model generation confidence and variations in sample learning difficulty during preference optimization, coupled with the absence of explicit confidence representation, which leads to unstable training and uncontrolled decision risks. To this end, we propose the first unified adaptive optimization framework that integrates uncertainty modeling into generative recommendation. Our approach employs uncertainty-weighted rewards to suppress overconfident errors, incorporates difficulty-aware optimization to dynamically adjust learning strategies, and introduces an explicit confidence alignment mechanism. Extensive experiments demonstrate that the proposed method significantly enhances both recommendation performance and training stability, effectively prevents performance degradation, and enables downstream risk-aware applications.
📝 Abstract
Generative Recommendation has emerged as a transformative paradigm, reformulating recommendation as an end-to-end autoregressive sequence generation task. Despite its promise, existing preference optimization methods typically rely on binary outcome correctness, suffering from a systemic limitation we term uncertainty blindness. This issue manifests in the neglect of the model's intrinsic generation confidence, the variation in sample learning difficulty, and the lack of explicit confidence expression, directly leading to unstable training dynamics and unquantifiable decision risks. In this paper, we propose Uncertainty-aware Generative Recommendation (UGR), a unified framework that leverages uncertainty as a critical signal for adaptive optimization. UGR synergizes three mechanisms: (1) an uncertainty-weighted reward to penalize confident errors; (2) difficulty-aware optimization dynamics to prevent premature convergence; and (3) explicit confidence alignment to empower the model with confidence expression capabilities. Extensive experiments demonstrate that UGR not only yields superior recommendation performance but also fundamentally stabilizes training, preventing the performance degradation often observed in standard methods. Furthermore, the learned confidence enables reliable downstream risk-aware applications.