DMA: Online RAG Alignment with Human Feedback

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Static retrieval in RAG systems struggles to adapt to dynamic user intent and content drift. To address this, we propose the Dynamic Memory Alignment (DMA) framework, enabling online learning and ranking optimization in interactive settings. Methodologically, DMA unifies document-level, list-level, and response-level human feedback into a cohesive co-learning process, jointly optimizing retrieval policies and performing knowledge distillation for low-latency, lightweight real-time alignment—without compromising base retrieval performance. It integrates supervised training, pointwise and listwise ranking modeling, and preference-driven reinforcement learning. Offline evaluation on TriviaQA and HotpotQA demonstrates significant improvements in retrieval quality. Online A/B testing shows substantial gains in user engagement, and industrial deployment confirms robustness and efficiency.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically incorporates multi-granularity human feedback to align ranking in interactive settings. DMA organizes document-, list-, and response-level signals into a coherent learning pipeline: supervised training for pointwise and listwise rankers, policy optimization driven by response-level preferences, and knowledge distillation into a lightweight scorer for low-latency serving. Throughout this paper, memory refers to the model's working memory, which is the entire context visible to the LLM for In-Context Learning. We adopt a dual-track evaluation protocol mirroring deployment: (i) large-scale online A/B ablations to isolate the utility of each feedback source, and (ii) few-shot offline tests on knowledge-intensive benchmarks. Online, a multi-month industrial deployment further shows substantial improvements in human engagement. Offline, DMA preserves competitive foundational retrieval while yielding notable gains on conversational QA (TriviaQA, HotpotQA). Taken together, these results position DMA as a principled approach to feedback-driven, real-time adaptation in RAG without sacrificing baseline capability.

Problem

Research questions and friction points this paper is trying to address.

Adapting RAG systems to evolving user intent and content drift

Incorporating multi-granularity human feedback for ranking alignment

Enabling real-time adaptation without sacrificing baseline retrieval capability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning framework with multi-granularity human feedback

Pipeline combining supervised training and policy optimization

Lightweight scorer for low-latency serving via distillation

🔎 Similar Papers

FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research