Counterfactual Risk Minimization with IPS-Weighted BPR and Self-Normalized Evaluation in Recommender Systems

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address exposure bias in implicit feedback for recommender systems, this paper proposes a counterfactual learning framework that integrates inverse propensity scoring (IPS) weighting with an enhanced Bayesian personalized ranking (BPR) objective. The method mitigates high variance induced by extreme IPS weights via a novel propensity regularization term and improves offline evaluation stability and robustness through a self-normalized IPS estimator. It unifies IPS-weighted training, regularized BPR optimization, direct estimation, and multiple evaluation variants within a single coherent framework. Experiments on synthetic data and MovieLens 100K demonstrate that the proposed approach significantly reduces evaluation variance—by an average of 37%—and enhances model generalization under unbiased exposure settings. The framework thus achieves a favorable trade-off between training effectiveness and evaluation reliability.

Technology Category

Application Category

📝 Abstract

Learning and evaluating recommender systems from logged implicit feedback is challenging due to exposure bias. While inverse propensity scoring (IPS) corrects this bias, it often suffers from high variance and instability. In this paper, we present a simple and effective pipeline that integrates IPS-weighted training with an IPS-weighted Bayesian Personalized Ranking (BPR) objective augmented by a Propensity Regularizer (PR). We compare Direct Method (DM), IPS, and Self-Normalized IPS (SNIPS) for offline policy evaluation, and demonstrate how IPS-weighted training improves model robustness under biased exposure. The proposed PR further mitigates variance amplification from extreme propensity weights, leading to more stable estimates. Experiments on synthetic and MovieLens 100K data show that our approach generalizes better under unbiased exposure while reducing evaluation variance compared to naive and standard IPS methods, offering practical guidance for counterfactual learning and evaluation in real-world recommendation settings.

Problem

Research questions and friction points this paper is trying to address.

Addresses exposure bias in recommender systems learning

Mitigates high variance in inverse propensity scoring methods

Improves stability of counterfactual evaluation with propensity regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

IPS-weighted BPR with Propensity Regularizer

Self-Normalized IPS for stable evaluation

Mitigates variance from extreme propensity weights

🔎 Similar Papers

Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation