Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-turn dialogue, large language models suffer from cumulative context decay—manifesting as attention pollution, dilution, and drift—leading to progressive degradation of context integrity and impairing both accuracy and instruction adherence. To address this, we propose Rhea, the first framework to formally conceptualize and mitigate *cumulative context decay*. Rhea introduces a dual-memory architecture—comprising instruction memory and interaction memory—coupled with a role-aware, heuristic episodic attention mechanism to enable efficient decoupling and fusion of these memories. It further employs structural priority scheduling, asymmetric noise suppression, and heuristic context retrieval to construct high signal-to-noise-ratio inference contexts. Evaluated across multiple multi-turn dialogue benchmarks, Rhea achieves an average accuracy gain of +1.04 points (on a 10-point scale) and attains an Instruction Adherence Rate (IAR) of >8.1—representing a 16% relative improvement over strong baselines—significantly alleviating performance degradation in long-horizon dialogues.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.
Problem

Research questions and friction points this paper is trying to address.

Addresses cumulative contextual decay in multi-turn conversations
Proposes decoupled memory modules for global and episodic information
Improves instruction fidelity and accuracy in conversational LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples conversation history into two memory modules
Uses priority attention to integrate episodic information
Maintains high instruction fidelity in long interactions
🔎 Similar Papers
W
Wanyang Hong
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China.
Zhaoning Zhang
Zhaoning Zhang
National University of Defense Technology
MLSysCompute VisionDistributed Computing
Y
Yi Chen
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China.
L
Libo Zhang
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China.
B
Baihui Liu
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China.
Linbo Qiao
Linbo Qiao
NUDT
Stochastic OptimizationDistributed OptimizationLarge-scale Machine Learning
Z
Zhiliang Tian
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China.
D
Dongsheng Li
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China.