InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing multi-fact editing methods for large language models (LLMs) suffer from limited performance in deep semantic understanding and high-concurrency scenarios, primarily due to context window bottlenecks causing KV cache redundancy and inefficient information selection. To address this, we propose GistEdit: the first method to explicitly compress edit contexts into lightweight “gist tokens” and inject them directly into the KV cache; a context-aware, dedicated cross-attention module enabling dynamic, adaptive selection from the gist pool; and an end-to-end differentiable model editing framework. Evaluated on multiple benchmarks—including FactEdit and KEDE—GistEdit significantly outperforms state-of-the-art approaches, supporting up to hundreds of concurrent edits without context window expansion. It achieves up to 2.3× faster inference while improving semantic consistency and generalization.

Technology Category

Application Category

📝 Abstract

Although existing model editing methods perform well in recalling exact edit facts, they often struggle in complex scenarios that require deeper semantic understanding rather than mere knowledge regurgitation. Leveraging the strong contextual reasoning abilities of large language models (LLMs), in-context learning (ICL) becomes a promising editing method by comprehending edit information through context encoding. However, this method is constrained by the limited context window of LLMs, leading to degraded performance and efficiency as the number of edits increases. To overcome this limitation, we propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts through explicit compression and selection mechanisms. Specifically, InComeS compresses each editing context into the key-value (KV) cache of a special gist token, enabling efficient handling of multiple edits without being restricted by the model's context window. Furthermore, specialized cross-attention modules are added to dynamically select the most relevant information from the gist pools, enabling adaptive and effective utilization of edit information. We conduct experiments on diverse model editing benchmarks with various editing formats, and the results demonstrate the effectiveness and efficiency of our method.

Problem

Research questions and friction points this paper is trying to address.

Enhances LLMs' ability to process editing contexts efficiently

Overcomes context window limitations in in-context learning

Improves performance in complex semantic understanding scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses editing context into KV cache

Uses gist tokens for efficient edit handling

Dynamically selects relevant information via cross-attention

🔎 Similar Papers

From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression