DMA: Online RAG Alignment with Human Feedback

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Static retrieval in RAG systems struggles to adapt to dynamic user intent and content drift. To address this, we propose the Dynamic Memory Alignment (DMA) framework, enabling online learning and ranking optimization in interactive settings. Methodologically, DMA unifies document-level, list-level, and response-level human feedback into a cohesive co-learning process, jointly optimizing retrieval policies and performing knowledge distillation for low-latency, lightweight real-time alignment—without compromising base retrieval performance. It integrates supervised training, pointwise and listwise ranking modeling, and preference-driven reinforcement learning. Offline evaluation on TriviaQA and HotpotQA demonstrates significant improvements in retrieval quality. Online A/B testing shows substantial gains in user engagement, and industrial deployment confirms robustness and efficiency.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically incorporates multi-granularity human feedback to align ranking in interactive settings. DMA organizes document-, list-, and response-level signals into a coherent learning pipeline: supervised training for pointwise and listwise rankers, policy optimization driven by response-level preferences, and knowledge distillation into a lightweight scorer for low-latency serving. Throughout this paper, memory refers to the model's working memory, which is the entire context visible to the LLM for In-Context Learning. We adopt a dual-track evaluation protocol mirroring deployment: (i) large-scale online A/B ablations to isolate the utility of each feedback source, and (ii) few-shot offline tests on knowledge-intensive benchmarks. Online, a multi-month industrial deployment further shows substantial improvements in human engagement. Offline, DMA preserves competitive foundational retrieval while yielding notable gains on conversational QA (TriviaQA, HotpotQA). Taken together, these results position DMA as a principled approach to feedback-driven, real-time adaptation in RAG without sacrificing baseline capability.
Problem

Research questions and friction points this paper is trying to address.

Adapting RAG systems to evolving user intent and content drift
Incorporating multi-granularity human feedback for ranking alignment
Enabling real-time adaptation without sacrificing baseline retrieval capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning framework with multi-granularity human feedback
Pipeline combining supervised training and policy optimization
Lightweight scorer for low-latency serving via distillation
🔎 Similar Papers
No similar papers found.
Y
Yu Bai
Zhongguancun Laboratory
Y
Yukai Miao
Zhongguancun Laboratory
D
Dawei Wang
Zhongguancun Laboratory
L
Li Chen
Zhongguancun Laboratory
F
Fei Long
Tsinghua University
R
Rundi Zhai
Beijing University of Posts and Telecommunications
D
Dan Li
Tsinghua University
Yanyu Ren
Yanyu Ren
Tsinghua University
ML SystemsAI for NetworkCS Education
T
Tianfeng Liu
Zhongguancun Laboratory
H
Hongtao Xie
China Mobile Communications Group Co., Ltd.
C
Ce Yang
China Mobile Communications Group Co., Ltd.
X
Xuhui Cai
China Mobile Communications Group Co., Ltd.