GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks lack targeted evaluation of ad-injected responses in Generative Engine Marketing (GEM), hindering progress in this domain. Method: We introduce GEM-Bench—the first comprehensive benchmark for ad-injected response generation—comprising three datasets spanning chat and search scenarios. We design a multidimensional evaluation framework integrating user satisfaction and engagement metrics, propose a controllable ad-insertion strategy leveraging pre-generated ad-free responses, and develop an extensible multi-agent baseline framework. Contribution/Results: Experiments reveal that prompt-only methods increase click-through rates but substantially degrade user satisfaction; in contrast, the pre-generate-and-insert strategy achieves a superior trade-off between these objectives—at the cost of additional computational overhead. This exposes a fundamental efficiency–experience trade-off in GEM. GEM-Bench establishes a standardized evaluation foundation and methodological infrastructure for future research in generative advertising.

Technology Category

Application Category

📝 Abstract
Generative Engine Marketing (GEM) is an emerging ecosystem for monetizing generative engines, such as LLM-based chatbots, by seamlessly integrating relevant advertisements into their responses. At the core of GEM lies the generation and evaluation of ad-injected responses. However, existing benchmarks are not specifically designed for this purpose, which limits future research. To address this gap, we propose GEM-Bench, the first comprehensive benchmark for ad-injected response generation in GEM. GEM-Bench includes three curated datasets covering both chatbot and search scenarios, a metric ontology that captures multiple dimensions of user satisfaction and engagement, and several baseline solutions implemented within an extensible multi-agent framework. Our preliminary results indicate that, while simple prompt-based methods achieve reasonable engagement such as click-through rate, they often reduce user satisfaction. In contrast, approaches that insert ads based on pre-generated ad-free responses help mitigate this issue but introduce additional overhead. These findings highlight the need for future research on designing more effective and efficient solutions for generating ad-injected responses in GEM.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking ad-injected response generation in generative engines
Evaluating user satisfaction and engagement trade-offs in ad integration
Addressing limitations of existing benchmarks for marketing in chatbots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for ad-injected responses
Multi-agent framework with baseline solutions
Metric ontology measuring user satisfaction dimensions
S
Silan Hu
National University of Singapore, Singapore
S
Shiqi Zhang
National University of Singapore, Singapore; PyroWis AI, Singapore
Yimin Shi
Yimin Shi
National University of Singapore
LLMdata managementinformation retrievalconversational agents
Xiaokui Xiao
Xiaokui Xiao
National University of Singapore
DatabasesData ManagementData Privacy