🤖 AI Summary
In real-time code editing, large language models (LLMs) must efficiently regenerate the next token upon edits, yet existing incremental KV cache updates risk temporal misalignment, while full recomputation incurs prohibitive computational overhead. This paper proposes Positional Integrity Encoding (PIE), a novel method that analytically removes and re-applies the Rotary Position Embedding (RoPE) matrix to enable precise, single-matrix-multiplication incremental key cache correction—strictly preserving positional consistency. PIE is compatible with the entire DeepSeek-Coder family (1.3B, 6.7B, and 33B parameters) and evaluated on RepoBench-C-8k across insertion, deletion, and multi-point editing tasks. Compared to full recomputation, PIE reduces computational cost by over 85% while maintaining near-identical prediction accuracy, significantly outperforming prior localized update strategies.
📝 Abstract
In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequence length is long. Simply encoding the edited subsequence and integrating it to the original KV cache meets the temporal confusion problem, leading to significantly worse performance. We address this efficiency and accuracy trade-off by introducing underline{ extbf{Positional extbf{I}ntegrity extbf{E}ncoding} (PIE). Building upon the rotary positional encoding, PIE first removes the rotary matrices in the Key cache that introduce temporal confusion and then reapplies the correct rotary matrices. This process ensures that positional relationships between tokens are correct and requires only a single round of matrix multiplication. We validate the effectiveness of PIE through extensive experiments on the RepoBench-C-8k dataset, utilizing DeepSeek-Coder models with 1.3B, 6.7B, and 33B parameters. Our evaluation includes three real-world coding tasks: code insertion, code deletion, and multi-place code editing. Results demonstrate that PIE reduces computational overhead by over 85% compared to the standard full recomputation approach across all model sizes and tasks while well approximating the model performance.