RLRF: Competitive Search Agent Design via Reinforcement Learning from Ranker Feedback

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper addresses the challenge of optimizing document content for improved ranking in competitive search environments, where existing methods rely heavily on manual annotations, suffer from poor generalizability, and fail to adapt to dynamic adversarial ranking strategies. To overcome these limitations, we propose RLRF—a Reinforcement Learning-based Ranking Feedback framework—that uniquely leverages raw ranking feedback directly as reward signals for end-to-end training. RLRF employs a multi-agent competitive simulation to autonomously generate preference data, eliminating the need for human annotation. Crucially, it supports cross-ranking-function generalization and online adaptation to evolving opponent strategies. Experimental results demonstrate that our agent significantly outperforms baseline approaches under out-of-distribution ranking functions and dynamic competition settings, exhibiting strong robustness and strategic adaptability.

Technology Category

Application Category

📝 Abstract

Competitive search is a setting where document publishers modify them to improve their ranking in response to a query. Recently, publishers have increasingly leveraged LLMs to generate and modify competitive content. We introduce Reinforcement Learning from Ranker Feedback (RLRF), a framework that trains LLMs using preference datasets derived from ranking competitions. The goal of a publisher (LLM-based) agent is to optimize content for improved ranking while accounting for the strategies of competing agents. We generate the datasets using approaches that do not rely on human-authored data. We show that our proposed agents consistently and substantially outperform previously suggested approaches for LLM-based competitive document modification. We further show that our agents are effective with ranking functions they were not trained for (i.e., out of distribution) and they adapt to strategic opponents. These findings provide support to the significant potential of using reinforcement learning in competitive search.

Problem

Research questions and friction points this paper is trying to address.

Training LLMs to optimize content for improved search rankings

Developing competitive search agents via reinforcement learning from ranker feedback

Creating agents that adapt to strategic opponents and unseen ranking functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains LLMs using ranking competition preference datasets

Generates datasets without human-authored data

Optimizes content for improved ranking against competitors

🔎 Similar Papers

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification