🤖 AI Summary
Conversion rate (CVR) prediction in online advertising suffers from inconsistent evaluation protocols, fragmented methodological taxonomies, and unresolved challenges including data sparsity, selection bias, and causal confounding.
Method: We systematically survey CVR modeling paradigms and propose the first unified taxonomy encompassing six categories: statistical models, machine learning, deep neural networks, multi-task learning, causal inference, and graph neural networks. We conduct standardized benchmarking across diverse public and private datasets to assess reproducibility and identify sources of evaluation inconsistency.
Contribution/Results: Our analysis reveals critical limitations in current methodologies and establishes the first structured technology roadmap for CVR research. We identify four key future directions—semantic enhancement, attribution optimization, debiased learning, and joint CTR-CVR modeling—and provide an open, reproducible performance analysis framework. This work bridges theoretical foundations with empirical rigor, enabling systematic advancement in CVR prediction.
📝 Abstract
Conversion and conversion rate (CVR) prediction play a critical role in efficient advertising decision-making. In past decades, although researchers have developed plenty of models for CVR prediction, the methodological evolution and relationships between different techniques have been precluded. In this paper, we conduct a comprehensive literature review on CVR prediction in online advertising, and classify state-of-the-art CVR prediction models into six categories with respect to the underlying techniques and elaborate on connections between these techniques. For each category of models, we present the framework of underlying techniques, their advantages and disadvantages, and discuss how they are utilized for CVR prediction. Moreover, we summarize the performance of various CVR prediction models on public and proprietary datasets. Finally, we identify research trends, major challenges, and promising future directions. We observe that results of performance evaluation reported in prior studies are not unanimous; semantics-enriched, attribution-enhanced, debiased CVR prediction and jointly modeling CTR and CVR prediction would be promising directions to explore in the future. This review is expected to provide valuable references and insights for future researchers and practitioners in this area.