🤖 AI Summary
In multi-turn dialogue, large language models suffer from cumulative context decay—manifesting as attention pollution, dilution, and drift—leading to progressive degradation of context integrity and impairing both accuracy and instruction adherence. To address this, we propose Rhea, the first framework to formally conceptualize and mitigate *cumulative context decay*. Rhea introduces a dual-memory architecture—comprising instruction memory and interaction memory—coupled with a role-aware, heuristic episodic attention mechanism to enable efficient decoupling and fusion of these memories. It further employs structural priority scheduling, asymmetric noise suppression, and heuristic context retrieval to construct high signal-to-noise-ratio inference contexts. Evaluated across multiple multi-turn dialogue benchmarks, Rhea achieves an average accuracy gain of +1.04 points (on a 10-point scale) and attains an Instruction Adherence Rate (IAR) of >8.1—representing a 16% relative improvement over strong baselines—significantly alleviating performance degradation in long-horizon dialogues.
📝 Abstract
Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.