RLRF: Competitive Search Agent Design via Reinforcement Learning from Ranker Feedback

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of optimizing document content for improved ranking in competitive search environments, where existing methods rely heavily on manual annotations, suffer from poor generalizability, and fail to adapt to dynamic adversarial ranking strategies. To overcome these limitations, we propose RLRF—a Reinforcement Learning-based Ranking Feedback framework—that uniquely leverages raw ranking feedback directly as reward signals for end-to-end training. RLRF employs a multi-agent competitive simulation to autonomously generate preference data, eliminating the need for human annotation. Crucially, it supports cross-ranking-function generalization and online adaptation to evolving opponent strategies. Experimental results demonstrate that our agent significantly outperforms baseline approaches under out-of-distribution ranking functions and dynamic competition settings, exhibiting strong robustness and strategic adaptability.

Technology Category

Application Category

📝 Abstract
Competitive search is a setting where document publishers modify them to improve their ranking in response to a query. Recently, publishers have increasingly leveraged LLMs to generate and modify competitive content. We introduce Reinforcement Learning from Ranker Feedback (RLRF), a framework that trains LLMs using preference datasets derived from ranking competitions. The goal of a publisher (LLM-based) agent is to optimize content for improved ranking while accounting for the strategies of competing agents. We generate the datasets using approaches that do not rely on human-authored data. We show that our proposed agents consistently and substantially outperform previously suggested approaches for LLM-based competitive document modification. We further show that our agents are effective with ranking functions they were not trained for (i.e., out of distribution) and they adapt to strategic opponents. These findings provide support to the significant potential of using reinforcement learning in competitive search.
Problem

Research questions and friction points this paper is trying to address.

Training LLMs to optimize content for improved search rankings
Developing competitive search agents via reinforcement learning from ranker feedback
Creating agents that adapt to strategic opponents and unseen ranking functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains LLMs using ranking competition preference datasets
Generates datasets without human-authored data
Optimizes content for improved ranking against competitors
T
Tommy Mordo
Faculty of Data and Decision Sciences, Technion – Israel Institute of Technology
S
Sagie Dekel
Faculty of Data and Decision Sciences, Technion – Israel Institute of Technology
Omer Madmon
Omer Madmon
PhD Candidate, Technion
Game TheoryInformation DesignMechanism DesignMachine LearningArtificial Intelligence
Moshe Tennenholtz
Moshe Tennenholtz
Unknown affiliation
Oren Kurland
Oren Kurland
Technion
Information Retrieval