Controlling Output Rankings in Generative Engines for LLM-based Search

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the vulnerability of large language model (LLM)-based search and recommendation systems to initial retrieval order, which often leads to insufficient exposure for small merchants and independent creators. To tackle this issue, the authors propose CORE, a non-intrusive method that, for the first time, enables output ranking control in black-box LLM search systems by injecting optimized textual cues—specifically string-based, reasoning-based, and review-based prompts—into the retrieved results. Without modifying the underlying models or APIs, CORE significantly improves target item rankings across major LLMs including GPT-4o, Gemini-2.5, Claude-4, and Grok-3, achieving average promotion success rates of 91.4%, 86.6%, and 80.3% for Top-5, Top-3, and Top-1 positions, respectively, across 15 product categories, while preserving content fluency. The study also introduces ProductBench, the first large-scale benchmark for this task.

Technology Category

Application Category

📝 Abstract

The way customers search for and choose products is changing with the rise of large language models (LLMs). LLM-based search, or generative engines, provides direct product recommendations to users, rather than traditional online search results that require users to explore options themselves. However, these recommendations are strongly influenced by the initial retrieval order of LLMs, which disadvantages small businesses and independent creators by limiting their visibility. In this work, we propose CORE, an optimization method that \textbf{C}ontrols \textbf{O}utput \textbf{R}ankings in g\textbf{E}nerative Engines for LLM-based search. Since the LLM's interactions with the search engine are black-box, CORE targets the content returned by search engines as the primary means of influencing output rankings. Specifically, CORE optimizes retrieved content by appending strategically designed optimization content to steer the ranking of outputs. We introduce three types of optimization content: string-based, reasoning-based, and review-based, demonstrating their effectiveness in shaping output rankings. To evaluate CORE in realistic settings, we introduce ProductBench, a large-scale benchmark with 15 product categories and 200 products per category, where each product is associated with its top-10 recommendations collected from Amazon's search interface. Extensive experiments on four LLMs with search capabilities (GPT-4o, Gemini-2.5, Claude-4, and Grok-3) demonstrate that CORE achieves an average Promotion Success Rate of \textbf{91.4\% @Top-5}, \textbf{86.6\% @Top-3}, and \textbf{80.3\% @Top-1}, across 15 product categories, outperforming existing ranking manipulation methods while preserving the fluency of optimized content.

Problem

Research questions and friction points this paper is trying to address.

LLM-based search

output ranking

generative engines

visibility bias

product recommendation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based search

output ranking control

content optimization