๐ค AI Summary
This work systematically investigates the theoretical impact of three loss functionsโCross-Entropy (CCE), Binary Cross-Entropy (BCE), and Bayesian Personalized Ranking (BPR)โon ranking performance (NDCG/MRR) under negative sampling in recommender systems.
Method: Leveraging convex optimization, probabilistic inequalities, and ranking bound analysis, we derive tight theoretical bounds on NDCG and MRR under varying negative sampling schemes.
Contribution/Results: We establish four key results: (1) Under full negative sampling, CCE yields the tightest lower bound on NDCG/MRR; (2) With single-negative sampling, BPR and CCE are strictly equivalent in optimization objective; (3) We introduce a novel probabilistic lower-bound framework for loss functions under negative sampling and prove that all three losses converge to the same global optimum under sampling conditions; (4) BCE provides the strongest worst-case ranking guarantee. These theoretical findings are empirically validated across five public benchmark datasets and four state-of-the-art recommendation models.
๐ Abstract
Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in training Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items. While prior works empirically showed that CCE outperforms BCE and BPR when using the full set of negative items, we provide a theoretical explanation for this by proving that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR), followed by BPR and BCE. However, using the full set of negative items is computationally infeasible for large-scale RSs, prompting the use of negative sampling techniques. Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum. We further demonstrate that the sampled losses remain lower bounds for NDCG (MRR), albeit in a probabilistic sense. Our worst-case analysis shows that BCE offers the strongest bound on NDCG (MRR). Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at https://anonymous.4open.science/r/recsys_losses .