๐ค AI Summary
This work addresses the issue of โdialogue inertiaโ in multi-turn agent interactions, where large language models tend to repetitively mimic their own prior responses, thereby limiting exploratory behavior. The study is the first to reveal a connection between this phenomenon and context length, and proposes a novel method for constructing preference pairs without requiring environmental rewards. By analyzing attention mechanisms to detect inertia and leveraging differences in context lengths to generate implicit preference signals, the approach integrates contextual preference learning with a dynamic context management strategy during inference. This effectively balances exploration and exploitation. Evaluated across eight agent environments and one in-depth case study, the method significantly reduces dialogue inertia and consistently improves task performance.
๐ Abstract
Large language models excel as few-shot learners when provided with appropriate demonstrations, yet this strength becomes problematic in multiturn agent scenarios, where LLMs erroneously mimic their own previous responses as few-shot examples. Through attention analysis, we identify conversational inertia, a phenomenon where models exhibit strong diagonal attention to previous responses, which is associated with imitation bias that constrains exploration. This reveals a tension when transforming few-shot LLMs into agents: longer context enriches environmental feedback for exploitation, yet also amplifies conversational inertia that undermines exploration. Our key insight is that for identical states, actions generated with longer contexts exhibit stronger inertia than those with shorter contexts, enabling construction of preference pairs without environment rewards. Based on this, we propose Context Preference Learning to calibrate model preferences to favor low-inertia responses over highinertia ones. We further provide context management strategies at inference time to balance exploration and exploitation. Experimental results across eight agentic environments and one deep research scenario validate that our framework reduces conversational inertia and achieves performance improvements.