SAGEO Arena: A Realistic Environment for Evaluating Search-Augmented Generative Engine Optimization

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical gap in current research on Search-Augmented Generation Engine Optimization (SAGEO): the absence of realistic evaluation environments that support end-to-end assessment while accounting for the retrieval and re-ranking stages as well as web page structural information. To bridge this gap, we propose the first SAGEO evaluation framework that preserves authentic web page structures and integrates a complete pipeline encompassing retrieval, re-ranking, and generation, coupled with a large-scale structured web corpus and a generative response evaluation mechanism. Our experiments reveal that existing methods often degrade retrieval and re-ranking performance in realistic settings, but leveraging structural information effectively mitigates this issue. These findings underscore the necessity of stage-specific customization throughout the SAGEO pipeline, balancing both Search Engine Optimization (SEO) and Generation Engine Optimization (GEO).

Technology Category

Application Category

📝 Abstract
Search-Augmented Generative Engines (SAGE) have emerged as a new paradigm for information access, bridging web-scale retrieval with generative capabilities to deliver synthesized answers. This shift has fundamentally reshaped how web content gains exposure online, giving rise to Search-Augmented Generative Engine Optimization (SAGEO), the practice of optimizing web documents to improve their visibility in AI-generated responses. Despite growing interest, no evaluation environment currently supports comprehensive investigation of SAGEO. Specifically, existing benchmarks lack end-to-end visibility evaluation of optimization strategies, operating on pre-determined candidate documents that abstract away retrieval and reranking preceding generation. Moreover, existing benchmarks discard structural information (e.g., schema markup) present in real web documents, overlooking the rich signals that search systems actively leverage in practice. Motivated by these gaps, we introduce SAGEO Arena, a realistic and reproducible environment for stage-level SAGEO analysis. Our objective is to jointly target search-oriented optimization (SEO) and generation-centric optimization (GEO). To achieve this, we integrate a full generative search pipeline over a large-scale corpus of web documents with rich structural information. Our findings reveal that existing approaches remain largely impractical under realistic conditions and often degrade performance in retrieval and reranking. We also find that structural information helps mitigate these limitations, and that effective SAGEO requires tailoring optimization to each pipeline stage. Overall, our benchmark paves the way for realistic SAGEO evaluation and optimization beyond simplified settings.
Problem

Research questions and friction points this paper is trying to address.

Search-Augmented Generative Engine Optimization
evaluation environment
structural information
end-to-end visibility
generative search pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAGEO
search-augmented generation
structured web data
end-to-end evaluation
generative search pipeline
🔎 Similar Papers
No similar papers found.
Sunghwan Kim
Sunghwan Kim
Yonsei University
Natural Language ProcessingReinforcement Learning
W
Wooseok Jeong
Department of Computer Science and Engineering, Konkuk University, Seoul, Republic of Korea
S
Serin Kim
Department of Artificial Intelligence, Yonsei University, Seoul, Republic of Korea
S
Sangam Lee
Department of Artificial Intelligence, Yonsei University, Seoul, Republic of Korea
Dongha Lee
Dongha Lee
Yonsei University
Data miningInformation retrievalNatural language processing