GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Traditional KV cache eviction strategies—e.g., top-k attention pruning—rely on static heuristics and fail to capture dynamically evolving, implicit token dependencies, leading to critical context loss and suboptimal memory efficiency in long-sequence inference. To address this, we propose GraphKV: a graph-based framework that constructs a token similarity graph and employs a graph neural network to dynamically propagate attention importance signals, enabling adaptive retention of salient tokens in the KV cache. GraphKV transcends static pruning paradigms and seamlessly integrates with mainstream compression methods—including SnapKV and PyramidKV—in a plug-and-play manner. Experiments demonstrate that GraphKV significantly reduces memory footprint while improving both generation quality and inference throughput for long sequences. The implementation will be open-sourced.

Technology Category

Application Category

📝 Abstract

Efficient Key-Value (KV) cache management is essential for processing long text sequences in large language models (LLMs), where memory constraints often limit performance. Conventional KV eviction strategies, such as top-k selection based on attention scores, depend on static heuristics that fail to capture the evolving implicit dependencies among tokens during inference. To overcome this, we propose GraphKV, a graph-based framework that redefines token selection for KV cache compression. In GraphKV, tokens are modeled as nodes with importance scores, and edges represent their similarity relationships. Through a decay-signal-propagation mechanism, token importance is dynamically updated by propagating information across the graph, enabling adaptive retention of the most contextually significant tokens. GraphKV can be seamlessly utilized in existing KV cache eviction methods such as SnapKV and PyramidKV in a plug-and-play manner. Codes will be released on Github.

Problem

Research questions and friction points this paper is trying to address.

Dynamic token importance updating for KV cache

Overcoming static heuristic limitations in eviction

Graph-based framework for adaptive cache retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based framework for KV cache eviction

Dynamic token importance updating via graph propagation

Plug-and-play compatibility with existing eviction methods

🔎 Similar Papers

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference