Toward a benchmark for CTR prediction in online advertising: datasets, evaluation protocols and perspectives

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of a unified benchmark platform in click-through rate (CTR) prediction, this paper introduces Bench-CTR: the first open-source benchmark enabling fair, cross-paradigm evaluation of traditional statistical models, deep learning models, and large language model (LLM)-based approaches. Bench-CTR integrates both real-world and synthetic datasets and establishes a standardized evaluation framework—including unified data interfaces, consistent train/validation/test protocols, multi-dimensional metric categorization, and reproducible experimental guidelines. Key findings are: (1) higher-order models consistently outperform lower-order ones; (2) LLM-based methods achieve baseline performance using only 2% of training data, demonstrating exceptional data efficiency; and (3) CTR model performance surged during 2015–2016 but has since plateaued, revealing a fundamental bottleneck in the field’s progress. Bench-CTR provides a robust infrastructure for mechanistic analysis and next-generation model innovation.

Technology Category

Application Category

📝 Abstract
This research designs a unified architecture of CTR prediction benchmark (Bench-CTR) platform that offers flexible interfaces with datasets and components of a wide range of CTR prediction models. Moreover, we construct a comprehensive system of evaluation protocols encompassing real-world and synthetic datasets, a taxonomy of metrics, standardized procedures and experimental guidelines for calibrating the performance of CTR prediction models. Furthermore, we implement the proposed benchmark platform and conduct a comparative study to evaluate a wide range of state-of-the-art models from traditional multivariate statistical to modern large language model (LLM)-based approaches on three public datasets and two synthetic datasets. Experimental results reveal that, (1) high-order models largely outperform low-order models, though such advantage varies in terms of metrics and on different datasets; (2) LLM-based models demonstrate a remarkable data efficiency, i.e., achieving the comparable performance to other models while using only 2% of the training data; (3) the performance of CTR prediction models has achieved significant improvements from 2015 to 2016, then reached a stage with slow progress, which is consistent across various datasets. This benchmark is expected to facilitate model development and evaluation and enhance practitioners' understanding of the underlying mechanisms of models in the area of CTR prediction. Code is available at https://github.com/NuriaNinja/Bench-CTR.
Problem

Research questions and friction points this paper is trying to address.

Designs a unified benchmark platform for CTR prediction models
Establishes evaluation protocols with real and synthetic datasets
Compares traditional and LLM-based models to assess performance trends
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark platform with flexible model interfaces
Comprehensive evaluation system with real and synthetic datasets
Comparative study of traditional and LLM-based CTR models
🔎 Similar Papers
No similar papers found.
S
Shan Gao
School of Management, Huazhong University of Science and Technology, Wuhan 43004, China
Yanwu Yang
Yanwu Yang
University Tuebingen Hospital, Harbin Institute of Technology
neurosciencemedical imagegraph neural networkbrain connectome