Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Conventional relevance models output scalar scores with limited interpretability; existing Generative Relevance Models (GRMs) rely heavily on large-scale human annotations or synthetic Chain-of-Thought (CoT) data, suffering from poor generalization and inadequate domain-agnostic reasoning to handle the ambiguity and diversity inherent in open-domain search. Method: We formulate Xiaohongshu’s search ranking task as a generative reasoning problem and propose Stepwise Advantage Masking (SAM), a reinforcement learning strategy enabling fine-grained process-level supervision. We further integrate business-specific multi-step CoT prompting with model distillation to build a lightweight, production-ready GRM. Contribution/Results: Our approach significantly improves relevance performance on industrial-scale datasets. Rigorous A/B testing confirms substantial gains in user engagement metrics, and the model has been successfully deployed in production.

Technology Category

Application Category

📝 Abstract

Ranking relevance is a fundamental task in search engines, aiming to identify the items most relevant to a given user query. Traditional relevance models typically produce scalar scores or directly predict relevance labels, limiting both interpretability and the modeling of complex relevance signals. Inspired by recent advances in Chain-of-Thought (CoT) reasoning for complex tasks, we investigate whether explicit reasoning can enhance both interpretability and performance in relevance modeling. However, existing reasoning-based Generative Relevance Models (GRMs) primarily rely on supervised fine-tuning on large amounts of human-annotated or synthetic CoT data, which often leads to limited generalization. Moreover, domain-agnostic, free-form reasoning tends to be overly generic and insufficiently grounded, limiting its potential to handle the diverse and ambiguous cases prevalent in open-domain search. In this work, we formulate relevance modeling in Xiaohongshu search as a reasoning task and introduce a Reinforcement Learning (RL)-based training framework to enhance the grounded reasoning capabilities of GRMs. Specifically, we incorporate practical business-specific relevance criteria into the multi-step reasoning prompt design and propose Stepwise Advantage Masking (SAM), a lightweight process-supervision strategy which facilitates effective learning of these criteria through improved credit assignment. To enable industrial deployment, we further distill the large-scale RL-tuned model to a lightweight version suitable for real-world search systems. Extensive experiments on industrial datasets, along with online A/B tests, demonstrate the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Enhance interpretability and performance in search relevance modeling

Improve generalization of reasoning-based generative relevance models

Enable industrial deployment of enhanced relevance models in real systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances generative relevance reasoning

Stepwise advantage masking improves credit assignment process

Model distillation enables lightweight industrial deployment

🔎 Similar Papers

Non-autoregressive Generative Models for Reranking Recommendation