A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling

📅 2024-11-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work systematically investigates the theoretical impact of three loss functions—Cross-Entropy (CCE), Binary Cross-Entropy (BCE), and Bayesian Personalized Ranking (BPR)—on ranking performance (NDCG/MRR) under negative sampling in recommender systems. Method: Leveraging convex optimization, probabilistic inequalities, and ranking bound analysis, we derive tight theoretical bounds on NDCG and MRR under varying negative sampling schemes. Contribution/Results: We establish four key results: (1) Under full negative sampling, CCE yields the tightest lower bound on NDCG/MRR; (2) With single-negative sampling, BPR and CCE are strictly equivalent in optimization objective; (3) We introduce a novel probabilistic lower-bound framework for loss functions under negative sampling and prove that all three losses converge to the same global optimum under sampling conditions; (4) BCE provides the strongest worst-case ranking guarantee. These theoretical findings are empirically validated across five public benchmark datasets and four state-of-the-art recommendation models.

Technology Category

Application Category

📝 Abstract

Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in training Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items. While prior works empirically showed that CCE outperforms BCE and BPR when using the full set of negative items, we provide a theoretical explanation for this by proving that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR), followed by BPR and BCE. However, using the full set of negative items is computationally infeasible for large-scale RSs, prompting the use of negative sampling techniques. Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum. We further demonstrate that the sampled losses remain lower bounds for NDCG (MRR), albeit in a probabilistic sense. Our worst-case analysis shows that BCE offers the strongest bound on NDCG (MRR). Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at https://anonymous.4open.science/r/recsys_losses .

Problem

Research questions and friction points this paper is trying to address.

Theoretical analysis of recommendation loss functions.

Comparison of CCE, BCE, and BPR under negative sampling.

Impact of loss functions on NDCG and MRR metrics.

Innovation

Methods, ideas, or system contributions that make the work stand out.

CCE offers tightest lower bound

BPR and CCE are equivalent

BCE provides strongest NDCG bound

🔎 Similar Papers

Negative Sampling in Recommendation: A Survey and Future Directions