๐ค AI Summary
To address exposure bias in implicit feedback for recommender systems, this paper proposes a counterfactual learning framework that integrates inverse propensity scoring (IPS) weighting with an enhanced Bayesian personalized ranking (BPR) objective. The method mitigates high variance induced by extreme IPS weights via a novel propensity regularization term and improves offline evaluation stability and robustness through a self-normalized IPS estimator. It unifies IPS-weighted training, regularized BPR optimization, direct estimation, and multiple evaluation variants within a single coherent framework. Experiments on synthetic data and MovieLens 100K demonstrate that the proposed approach significantly reduces evaluation varianceโby an average of 37%โand enhances model generalization under unbiased exposure settings. The framework thus achieves a favorable trade-off between training effectiveness and evaluation reliability.
๐ Abstract
Learning and evaluating recommender systems from logged implicit feedback is challenging due to exposure bias. While inverse propensity scoring (IPS) corrects this bias, it often suffers from high variance and instability. In this paper, we present a simple and effective pipeline that integrates IPS-weighted training with an IPS-weighted Bayesian Personalized Ranking (BPR) objective augmented by a Propensity Regularizer (PR). We compare Direct Method (DM), IPS, and Self-Normalized IPS (SNIPS) for offline policy evaluation, and demonstrate how IPS-weighted training improves model robustness under biased exposure. The proposed PR further mitigates variance amplification from extreme propensity weights, leading to more stable estimates. Experiments on synthetic and MovieLens 100K data show that our approach generalizes better under unbiased exposure while reducing evaluation variance compared to naive and standard IPS methods, offering practical guidance for counterfactual learning and evaluation in real-world recommendation settings.