🤖 AI Summary
This paper addresses the lack of fair, reproducible evaluation standards for automated bidding algorithms—including classical control and reinforcement learning (RL) approaches—by proposing the first standardized, multi-dimensional open-source evaluation framework. Built upon an industrial-grade ad auction simulation environment, the framework uniformly defines core metrics such as performance, cost control, and budget pacing, enabling systematic comparison across controller-based algorithms, RL models, and optimal closed-form methods. Its key contributions are: (1) a cross-algorithm, cross-platform-compatible evaluation protocol; (2) empirical identification of implicit trade-offs and failure boundaries of diverse methods under varying objectives (e.g., ROI, impression volume, budget consumption rate); and (3) reproducible benchmark results and practical algorithm selection guidelines. Extensive experiments validate the framework’s effectiveness and generalizability, providing industry practitioners with transparent, robust decision support for algorithm selection.
📝 Abstract
Advertisement auctions play a crucial role in revenue generation for e-commerce companies. To make the bidding procedure scalable to thousands of auctions, the automatic bidding (autobidding) algorithms are actively developed in the industry. Therefore, the fair and reproducible evaluation of autobidding algorithms is an important problem. We present a standardized and transparent evaluation protocol for comparing classical and reinforcement learning (RL) autobidding algorithms. We consider the most efficient autobidding algorithms from different classes, e.g., ones based on the controllers, RL, optimal formulas, etc., and benchmark them in the bidding environment. We utilize the most recent open-source environment developed in the industry, which accurately emulates the bidding process. Our work demonstrates the most promising use cases for the considered autobidding algorithms, highlights their surprising drawbacks, and evaluates them according to multiple metrics. We select the evaluation metrics that illustrate the performance of the autobidding algorithms, the corresponding costs, and track the budget pacing. Such a choice of metrics makes our results applicable to the broad range of platforms where autobidding is effective. The presented comparison results help practitioners to evaluate the candidate autobidding algorithms from different perspectives and select ones that are efficient according to their companies' targets.