🤖 AI Summary
This paper addresses the challenge of optimizing document content for improved ranking in competitive search environments, where existing methods rely heavily on manual annotations, suffer from poor generalizability, and fail to adapt to dynamic adversarial ranking strategies. To overcome these limitations, we propose RLRF—a Reinforcement Learning-based Ranking Feedback framework—that uniquely leverages raw ranking feedback directly as reward signals for end-to-end training. RLRF employs a multi-agent competitive simulation to autonomously generate preference data, eliminating the need for human annotation. Crucially, it supports cross-ranking-function generalization and online adaptation to evolving opponent strategies. Experimental results demonstrate that our agent significantly outperforms baseline approaches under out-of-distribution ranking functions and dynamic competition settings, exhibiting strong robustness and strategic adaptability.
📝 Abstract
Competitive search is a setting where document publishers modify them to improve their ranking in response to a query. Recently, publishers have increasingly leveraged LLMs to generate and modify competitive content. We introduce Reinforcement Learning from Ranker Feedback (RLRF), a framework that trains LLMs using preference datasets derived from ranking competitions. The goal of a publisher (LLM-based) agent is to optimize content for improved ranking while accounting for the strategies of competing agents. We generate the datasets using approaches that do not rely on human-authored data. We show that our proposed agents consistently and substantially outperform previously suggested approaches for LLM-based competitive document modification. We further show that our agents are effective with ranking functions they were not trained for (i.e., out of distribution) and they adapt to strategic opponents. These findings provide support to the significant potential of using reinforcement learning in competitive search.