Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

In multi-turn dialogue, large language models suffer from cumulative context decay—manifesting as attention pollution, dilution, and drift—leading to progressive degradation of context integrity and impairing both accuracy and instruction adherence. To address this, we propose Rhea, the first framework to formally conceptualize and mitigate *cumulative context decay*. Rhea introduces a dual-memory architecture—comprising instruction memory and interaction memory—coupled with a role-aware, heuristic episodic attention mechanism to enable efficient decoupling and fusion of these memories. It further employs structural priority scheduling, asymmetric noise suppression, and heuristic context retrieval to construct high signal-to-noise-ratio inference contexts. Evaluated across multiple multi-turn dialogue benchmarks, Rhea achieves an average accuracy gain of +1.04 points (on a 10-point scale) and attains an Instruction Adherence Rate (IAR) of >8.1—representing a 16% relative improvement over strong baselines—significantly alleviating performance degradation in long-horizon dialogues.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.

Problem

Research questions and friction points this paper is trying to address.

Addresses cumulative contextual decay in multi-turn conversations

Proposes decoupled memory modules for global and episodic information

Improves instruction fidelity and accuracy in conversational LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples conversation history into two memory modules

Uses priority attention to integrate episodic information

Maintains high instruction fidelity in long interactions

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs