Counterfactual Risk Minimization with IPS-Weighted BPR and Self-Normalized Evaluation in Recommender Systems

๐Ÿ“… 2025-08-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address exposure bias in implicit feedback for recommender systems, this paper proposes a counterfactual learning framework that integrates inverse propensity scoring (IPS) weighting with an enhanced Bayesian personalized ranking (BPR) objective. The method mitigates high variance induced by extreme IPS weights via a novel propensity regularization term and improves offline evaluation stability and robustness through a self-normalized IPS estimator. It unifies IPS-weighted training, regularized BPR optimization, direct estimation, and multiple evaluation variants within a single coherent framework. Experiments on synthetic data and MovieLens 100K demonstrate that the proposed approach significantly reduces evaluation varianceโ€”by an average of 37%โ€”and enhances model generalization under unbiased exposure settings. The framework thus achieves a favorable trade-off between training effectiveness and evaluation reliability.

Technology Category

Application Category

๐Ÿ“ Abstract
Learning and evaluating recommender systems from logged implicit feedback is challenging due to exposure bias. While inverse propensity scoring (IPS) corrects this bias, it often suffers from high variance and instability. In this paper, we present a simple and effective pipeline that integrates IPS-weighted training with an IPS-weighted Bayesian Personalized Ranking (BPR) objective augmented by a Propensity Regularizer (PR). We compare Direct Method (DM), IPS, and Self-Normalized IPS (SNIPS) for offline policy evaluation, and demonstrate how IPS-weighted training improves model robustness under biased exposure. The proposed PR further mitigates variance amplification from extreme propensity weights, leading to more stable estimates. Experiments on synthetic and MovieLens 100K data show that our approach generalizes better under unbiased exposure while reducing evaluation variance compared to naive and standard IPS methods, offering practical guidance for counterfactual learning and evaluation in real-world recommendation settings.
Problem

Research questions and friction points this paper is trying to address.

Addresses exposure bias in recommender systems learning
Mitigates high variance in inverse propensity scoring methods
Improves stability of counterfactual evaluation with propensity regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

IPS-weighted BPR with Propensity Regularizer
Self-Normalized IPS for stable evaluation
Mitigates variance from extreme propensity weights
๐Ÿ”Ž Similar Papers
No similar papers found.