Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the suboptimal variance control and lack of theoretical guarantees in the self-normalized inverse propensity score (SNIPS) estimator for off-policy evaluation (OPE). To overcome these limitations, the authors propose the β*-IPS estimator, which replaces the conventional multiplicative self-normalization with an optimal additive control variate (i.e., baseline correction). Theoretical analysis establishes, for the first time, that the optimal additive baseline strictly dominates SNIPS in terms of asymptotic mean squared error. Moreover, the study reveals that SNIPS is equivalent to employing a fixed—yet generally suboptimal—additive baseline. This contribution provides a theoretically grounded and empirically effective estimation method for OPE in recommendation and ranking systems.

Technology Category

Application Category

📝 Abstract

Off-policy evaluation (OPE) is essential for assessing ranking and recommendation systems without costly online interventions. Self-Normalised Inverse Propensity Scoring (SNIPS) is a standard tool for variance reduction in OPE, leveraging a multiplicative control variate. Recent advances in off-policy learning suggest that additive control variates (baseline corrections) may offer superior performance, yet theoretical guarantees for evaluation are lacking. This paper provides a definitive answer: we prove that $\beta^\star$-IPS, an estimator with an optimal additive baseline, asymptotically dominates SNIPS in Mean Squared Error. By analytically decomposing the variance gap, we show that SNIPS is asymptotically equivalent to using a specific -- but generally sub-optimal -- additive baseline. Our results theoretically justify shifting from self-normalisation to optimal baseline corrections for both ranking and recommendation.

Problem

Research questions and friction points this paper is trying to address.

Off-Policy Evaluation

Variance Reduction

Self-Normalisation

Additive Control Variates

Recommendation Systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

additive control variates

off-policy evaluation

SNIPS