🤖 AI Summary
In UGC platforms, RAG systems face challenges in accurately assessing query-document relevance due to sparse user feedback—leading to ambiguous user intent—and high noise in informal, unstructured text.
Method: We propose a decomposition-based reasoning framework grounded in reinforcement learning. It decouples relevance judgment into two subtasks: implicit query intent inference and verbatim snippet extraction, jointly optimized via a tailored reward mechanism that leverages sparse feedback. Crucially, we introduce top-ranked documents as weak supervision signals to enhance discriminative robustness under noisy conditions.
Contribution/Results: Evaluated on multiple offline benchmarks and real-world UGC platform online A/B tests, our method consistently outperforms state-of-the-art baselines, achieving an average 12.7% improvement in relevance assessment accuracy. The decomposition strategy and weakly supervised reward design significantly improve generalization and reliability in low-signal, high-noise retrieval scenarios.
📝 Abstract
Retrieval-augmented generation (RAG) plays a critical role in user-generated content (UGC) platforms, but its effectiveness depends heavily on accurate relevance assessment of query-document pairs. Despite recent advances in applying large language models (LLMs) to relevance modeling, UGC platforms present unique challenges: 1) ambiguous user intent due to sparse user feedback in RAG scenarios, and 2) substantial noise introduced by informal and unstructured language. To address these issues, we propose the Reinforced Reasoning Model for Relevance Assessment (R3A), which introduces a decomposed reasoning framework over queries and candidate documents before scoring. R3A first leverages auxiliary high-ranked documents within the platform to infer latent query intent. It then performs verbatim fragment extraction to justify relevance decisions, thereby reducing errors caused by noisy UGC. Based on a reinforcement learning framework, R3A is optimized to mitigate distortions arising from ambiguous queries and unstructured content. Experimental results show that R3A significantly outperforms existing baseline methods in terms of relevance accuracy, across both offline benchmarks and online experiments.