DyKnow-RAG: Dynamic Knowledge Utilization Reinforcement Framework for Noisy Retrieval-Augmented Generation in E-commerce Search Relevance

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In e-commerce search, long-tail, knowledge-intensive, and dynamically evolving queries challenge parametric large language models to accurately model query-item relevance; meanwhile, noisy external contexts (e.g., reviews, attribute encyclopedias) cannot be pre-cleaned due to latency and cost constraints. Method: We propose a dynamic context decision framework that, within a single retrieval-generation pass, adaptively determines whether to use, partially use, or ignore retrieved content. We introduce a posterior advantage reweighting mechanism based on Groupwise Relative Policy Optimization (GRPO), enabling implicit trade-off learning between retrieval reliance and parametric knowledge—without extra annotations or inference overhead. The framework integrates SFT initialization, uncertainty-guided RL sampling, and lightweight DPO warm-starting. Results: Our method significantly outperforms SFT, DPO, and baseline GRPO in offline evaluation and Taobao A/B tests, improving GSB and Query/Item Goodrate. It has been fully deployed and is serving production traffic stably.

Technology Category

Application Category

📝 Abstract
Accurately modeling query-item relevance drives e-commerce ranking, yet long-tail, knowledge-heavy, and fast-evolving queries exceed parametric LLM coverage. External context (reviews, attribute encyclopedias, UGC) can help but is noisy, and single-pass latency and cost forbid any clean-then-summarize step. The model must, per query, judge relevance and decide whether to use, partially use, or ignore the context. DyKnow-RAG is a dynamic noisy-RAG framework built on Group Relative Policy Optimization. It trains two rollout groups (no external context vs a single retrieved chunk) and applies posterior-driven inter-group advantage scaling that adaptively reweights their contributions by the per-query correctness gap. This teaches when to trust retrieval versus fall back to parametric knowledge, without process labels, value networks, or extra inference passes, preserving single-pass, single-chunk deployment under production latency. Training combines: (1) supervised initialization with a structured rationale that explicitly records the context-usage decision; (2) an RL pool prioritized by SFT uncertainty to focus where context choice is most consequential; and (3) an optional lightweight DPO warm start to stabilize with-context calibration. Under a unified retrieval/index and fixed latency budget, DyKnow-RAG outperforms SFT, DPO, and vanilla GRPO in offline tests, and delivers consistent lifts on GSB, Query Goodrate, and Item Goodrate in Taobao A/B testing. It is deployed in Taobao's production relevance system, serving live traffic. To our knowledge, it is among the first single-pass RAG solutions for e-commerce relevance, turning noisy external signals into reliable gains without added online complexity.
Problem

Research questions and friction points this paper is trying to address.

Dynamic knowledge utilization for noisy retrieval-augmented generation in e-commerce
Judging when to use or ignore noisy external context for query relevance
Single-pass framework balancing parametric knowledge and retrieved information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic knowledge utilization with Group Relative Policy Optimization
Posterior-driven inter-group advantage scaling for adaptive weighting
Single-pass deployment without process labels or extra inference
🔎 Similar Papers
No similar papers found.