🤖 AI Summary
Static retrieval in RAG systems struggles to adapt to dynamic user intent and content drift. To address this, we propose the Dynamic Memory Alignment (DMA) framework, enabling online learning and ranking optimization in interactive settings. Methodologically, DMA unifies document-level, list-level, and response-level human feedback into a cohesive co-learning process, jointly optimizing retrieval policies and performing knowledge distillation for low-latency, lightweight real-time alignment—without compromising base retrieval performance. It integrates supervised training, pointwise and listwise ranking modeling, and preference-driven reinforcement learning. Offline evaluation on TriviaQA and HotpotQA demonstrates significant improvements in retrieval quality. Online A/B testing shows substantial gains in user engagement, and industrial deployment confirms robustness and efficiency.
📝 Abstract
Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically incorporates multi-granularity human feedback to align ranking in interactive settings. DMA organizes document-, list-, and response-level signals into a coherent learning pipeline: supervised training for pointwise and listwise rankers, policy optimization driven by response-level preferences, and knowledge distillation into a lightweight scorer for low-latency serving. Throughout this paper, memory refers to the model's working memory, which is the entire context visible to the LLM for In-Context Learning. We adopt a dual-track evaluation protocol mirroring deployment: (i) large-scale online A/B ablations to isolate the utility of each feedback source, and (ii) few-shot offline tests on knowledge-intensive benchmarks. Online, a multi-month industrial deployment further shows substantial improvements in human engagement. Offline, DMA preserves competitive foundational retrieval while yielding notable gains on conversational QA (TriviaQA, HotpotQA). Taken together, these results position DMA as a principled approach to feedback-driven, real-time adaptation in RAG without sacrificing baseline capability.