🤖 AI Summary
This work challenges the prevailing assumption that retrieval heads in large language models operate as static mechanisms, demonstrating instead that they exhibit temporal dynamics, sequence-specific behavior, and predictability during autoregressive generation. The study reveals an implicit internal planning mechanism governing retrieval behavior within the model. Through fine-grained timestep analysis, comparative evaluation of dynamic versus static retrieval heads, probing of hidden state signals, and a novel dynamic retrieval-augmented generation framework, the authors systematically validate this perspective. Experimental results on Needle-in-a-Haystack and multi-hop question answering benchmarks show that dynamic retrieval heads significantly outperform static counterparts, with performance gains that are statistically significant.
📝 Abstract
Recent studies have identified"retrieval heads"in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claims: (1) Dynamism: Retrieval heads vary dynamically across timesteps; (2) Irreplaceability: Dynamic retrieval heads are specific at each timestep and cannot be effectively replaced by static retrieval heads; and (3) Correlation: The model's hidden state encodes a predictive signal for future retrieval head patterns, indicating an internal planning mechanism. We validate these findings on the Needle-in-a-Haystack task and a multi-hop QA task, and quantify the differences on the utility of dynamic and static retrieval heads in a Dynamic Retrieval-Augmented Generation framework. Our study provides new insights into the internal mechanisms of LLMs.