🤖 AI Summary
To address unreliable outputs and entangled uncertainty sources in large language model (LLM)-based recommendation, this paper proposes the first uncertainty disentanglement framework, decomposing recommendation uncertainty into two orthogonal components: model-intrinsic uncertainty and prompt-induced uncertainty. Methodologically, we integrate probabilistic calibration, multi-sample inference, and sensitivity analysis, and design uncertainty-aware prompt engineering to enable uncertainty attribution and controllable optimization. Experiments demonstrate that our proposed uncertainty metrics exhibit strong negative correlation with recommendation quality (e.g., NDCG@10). On multiple benchmark datasets, our approach reduces overall uncertainty by 18.7% and improves NDCG@10 by 5.2%. The code and model weights are publicly released.
📝 Abstract
Despite the widespread adoption of large language models (LLMs) for recommendation, we demonstrate that LLMs often exhibit uncertainty in their recommendations. To ensure the trustworthy use of LLMs in generating recommendations, we emphasize the importance of assessing the reliability of recommendations generated by LLMs. We start by introducing a novel framework for estimating the predictive uncertainty to quantitatively measure the reliability of LLM-based recommendations. We further propose to decompose the predictive uncertainty into recommendation uncertainty and prompt uncertainty, enabling in-depth analyses of the primary source of uncertainty. Through extensive experiments, we (1) demonstrate predictive uncertainty effectively indicates the reliability of LLM-based recommendations, (2) investigate the origins of uncertainty with decomposed uncertainty measures, and (3) propose uncertainty-aware prompting for a lower predictive uncertainty and enhanced recommendation. Our source code and model weights are available at https://github.com/WonbinKweon/UNC_LLM_REC_WWW2025