🤖 AI Summary
Current LLMs, constrained by autoregressive sequential modeling, struggle to capture structural dependencies—such as graph relationships—among text segments, limiting their effectiveness in RAG and graph-structured reasoning tasks. To address this, we propose a structure-aware KV caching mechanism featuring a novel graph-structured block masking attention: it decouples positional encoding from topological dependency, enabling each target segment to attend exclusively to the KV representations of designated source segments, thereby realizing graph-guided sparse attention and message-passing-style context aggregation. The method supports end-to-end training without modifying the model backbone. Evaluated on seven RAG benchmarks, Arxiv-QA (a graph-based QA task), and citation network classification, our approach consistently outperforms sequential baselines, significantly mitigating positional bias while enhancing long-range and multi-hop reasoning capabilities.
📝 Abstract
Modern large language models (LLMs) are inherently auto-regressive, requiring input to be serialized into flat sequences regardless of their structural dependencies. This serialization hinders the model's ability to leverage structural inductive biases, especially in tasks such as retrieval-augmented generation (RAG) and reasoning on data with native graph structures, where inter-segment dependencies are crucial. We introduce Graph-KV with the potential to overcome this limitation. Graph-KV leverages the KV-cache of text segments as condensed representations and governs their interaction through structural inductive biases. In this framework, 'target' segments selectively attend only to the KV-caches of their designated 'source' segments, rather than all preceding segments in a serialized sequence. This approach induces a graph-structured block mask, sparsifying attention and enabling a message-passing-like step within the LLM. Furthermore, strategically allocated positional encodings for source and target segments reduce positional bias and context window consumption. We evaluate Graph-KV across three scenarios: (1) seven RAG benchmarks spanning direct inference, multi-hop reasoning, and long-document understanding; (2) Arxiv-QA, a novel academic paper QA task with full-text scientific papers structured as citation ego-graphs; and (3) paper topic classification within a citation network. By effectively reducing positional bias and harnessing structural inductive biases, Graph-KV substantially outperforms baselines, including standard costly sequential encoding, across various settings. Code and the Graph-KV data are publicly available.