Controlling Output Rankings in Generative Engines for LLM-based Search

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of large language model (LLM)-based search and recommendation systems to initial retrieval order, which often leads to insufficient exposure for small merchants and independent creators. To tackle this issue, the authors propose CORE, a non-intrusive method that, for the first time, enables output ranking control in black-box LLM search systems by injecting optimized textual cues—specifically string-based, reasoning-based, and review-based prompts—into the retrieved results. Without modifying the underlying models or APIs, CORE significantly improves target item rankings across major LLMs including GPT-4o, Gemini-2.5, Claude-4, and Grok-3, achieving average promotion success rates of 91.4%, 86.6%, and 80.3% for Top-5, Top-3, and Top-1 positions, respectively, across 15 product categories, while preserving content fluency. The study also introduces ProductBench, the first large-scale benchmark for this task.

Technology Category

Application Category

📝 Abstract
The way customers search for and choose products is changing with the rise of large language models (LLMs). LLM-based search, or generative engines, provides direct product recommendations to users, rather than traditional online search results that require users to explore options themselves. However, these recommendations are strongly influenced by the initial retrieval order of LLMs, which disadvantages small businesses and independent creators by limiting their visibility. In this work, we propose CORE, an optimization method that \textbf{C}ontrols \textbf{O}utput \textbf{R}ankings in g\textbf{E}nerative Engines for LLM-based search. Since the LLM's interactions with the search engine are black-box, CORE targets the content returned by search engines as the primary means of influencing output rankings. Specifically, CORE optimizes retrieved content by appending strategically designed optimization content to steer the ranking of outputs. We introduce three types of optimization content: string-based, reasoning-based, and review-based, demonstrating their effectiveness in shaping output rankings. To evaluate CORE in realistic settings, we introduce ProductBench, a large-scale benchmark with 15 product categories and 200 products per category, where each product is associated with its top-10 recommendations collected from Amazon's search interface. Extensive experiments on four LLMs with search capabilities (GPT-4o, Gemini-2.5, Claude-4, and Grok-3) demonstrate that CORE achieves an average Promotion Success Rate of \textbf{91.4\% @Top-5}, \textbf{86.6\% @Top-3}, and \textbf{80.3\% @Top-1}, across 15 product categories, outperforming existing ranking manipulation methods while preserving the fluency of optimized content.
Problem

Research questions and friction points this paper is trying to address.

LLM-based search
output ranking
generative engines
visibility bias
product recommendation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based search
output ranking control
content optimization
generative engines
ranking manipulation
🔎 Similar Papers
No similar papers found.