IntOPE: Off-Policy Evaluation in the Presence of Interference

📅 2024-08-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Traditional off-policy evaluation (OPE) relies on the Stable Unit Treatment Value Assumption (SUTVA), rendering it inadequate for interference settings where individual rewards depend on others’ actions—e.g., social recommendation or personalized healthcare. This work relaxes SUTVA by introducing IntIPW, the first OPE estimator explicitly modeling neighborhood interference via graph-structured dependencies. IntIPW jointly marginalizes over both individual actions and interference effects using importance weights derived from a causal graph, enabling unbiased and consistent OPE within an inverse probability weighting framework. We establish its asymptotic unbiasedness and consistency under mild assumptions. Empirical evaluation on synthetic benchmarks and real-world social recommendation datasets demonstrates that IntIPW reduces estimation bias by up to 42% compared to state-of-the-art OPE methods, while maintaining controlled variance—significantly advancing robustness and accuracy in interference-aware policy evaluation.

Technology Category

Application Category

📝 Abstract

Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual's reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.

Problem

Research questions and friction points this paper is trying to address.

Evaluating policies with interference among individuals

Overcoming SUTVA limitation in real-world scenarios

Extending IPW to account for peer influence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends IPW with marginalized importance weights

Accounts for individual and peer actions

Validated on synthetic and real-world data

🔎 Similar Papers

A Comprehensive Survey of Contamination Detection Methods in Large Language Models