Breaking Determinism: Stochastic Modeling for Reliable Off-Policy Evaluation in Ad Auctions

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Deterministic mechanisms in online advertising auctions assign zero exposure probability to non-winning ads, rendering conventional offline policy evaluation (OPE) methods invalid. To address this, we propose the first reliable OPE framework tailored to deterministic auction environments. Our approach pioneers the integration of stochastic modeling into this setting: we reconstruct propensity scores by modeling bid distributions, yielding robust approximate propensities that enable counterfactual estimators—such as Self-Normalized Inverse Propensity Scoring (SNIPS)—to remain applicable. We validate our method through both AuctionNet simulation and large-scale industrial data. In CTR prediction tasks, it achieves 92% mean directional accuracy (MDA), significantly outperforming parametric baselines and exhibiting strong alignment with online A/B test results. This work establishes a low-risk, high-efficiency paradigm for offline evaluation in large-scale advertising systems.

Technology Category

Application Category

📝 Abstract

Online A/B testing, the gold standard for evaluating new advertising policies, consumes substantial engineering resources and risks significant revenue loss from deploying underperforming variations. This motivates the use of Off-Policy Evaluation (OPE) for rapid, offline assessment. However, applying OPE to ad auctions is fundamentally more challenging than in domains like recommender systems, where stochastic policies are common. In online ad auctions, it is common for the highest-bidding ad to win the impression, resulting in a deterministic, winner-takes-all setting. This results in zero probability of exposure for non-winning ads, rendering standard OPE estimators inapplicable. We introduce the first principled framework for OPE in deterministic auctions by repurposing the bid landscape model to approximate the propensity score. This model allows us to derive robust approximate propensity scores, enabling the use of stable estimators like Self-Normalized Inverse Propensity Scoring (SNIPS) for counterfactual evaluation. We validate our approach on the AuctionNet simulation benchmark and against 2-weeks online A/B test from a large-scale industrial platform. Our method shows remarkable alignment with online results, achieving a 92% Mean Directional Accuracy (MDA) in CTR prediction, significantly outperforming the parametric baseline. MDA is the most critical metric for guiding deployment decisions, as it reflects the ability to correctly predict whether a new model will improve or harm performance. This work contributes the first practical and validated framework for reliable OPE in deterministic auction environments, offering an efficient alternative to costly and risky online experiments.

Problem

Research questions and friction points this paper is trying to address.

Develops a stochastic model for off-policy evaluation in deterministic ad auctions.

Enables reliable counterfactual assessment using approximated propensity scores.

Provides a validated alternative to costly online A/B testing for ad policies.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using bid landscape model to approximate propensity scores

Enabling SNIPS estimator for deterministic auction evaluation

Validating framework with simulation and real-world A/B tests

🔎 Similar Papers

No similar papers found.

Authors to Follow