Context Compression via Explicit Information Transmission

πŸ“… 2026-02-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the high computational cost of attention mechanisms and key-value caching in long-context reasoning, necessitating efficient soft compression methods. The authors propose ComprExIT, a lightweight framework for soft context compression that operates on frozen hidden states of large language models. ComprExIT decouples compression from self-attention dynamics through an explicit information transfer mechanism, innovatively incorporating both depth-wise and width-wise information propagation: the former mitigates layer-wise representation overwriting, while the latter enables globally coordinated information allocation. This approach integrates token-anchor-based multi-layer selective transfer with a globally optimized slot aggregation mechanism. Evaluated on six question-answering benchmarks, ComprExIT significantly outperforms state-of-the-art methods, achieving substantially improved compression efficacy and robustness with only approximately 1% additional parameters.

Technology Category

Application Category

πŸ“ Abstract
Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing methods typically re-purpose the LLM itself as a trainable compressor, relying on layer-by-layer self-attention to iteratively aggregate information. We argue that this paradigm suffers from two structural limitations: (i) progressive representation overwriting across layers (ii) uncoordinated allocation of compression capacity across tokens. We propose ComprExIT (Context Compression via Explicit Information Transmission), a lightweight framework that formulates soft compression into a new paradigm: explicit information transmission over frozen LLM hidden states. This decouples compression from the model's internal self-attention dynamics. ComprExIT performs (i) depth-wise transmission to selectively transmit multi-layer information into token anchors, mitigating progressive overwriting, and (ii) width-wise transmission to aggregate anchors into a small number of slots via a globally optimized transmission plan, ensuring coordinated allocation of information. Across six question-answering benchmarks, ComprExIT consistently outperforms state-of-the-art context compression methods while introducing only ~1% additional parameters, demonstrating that explicit and coordinated information transmission enables more effective and robust long-context compression.
Problem

Research questions and friction points this paper is trying to address.

context compression
long-context inference
large language models
soft compression
information transmission
Innovation

Methods, ideas, or system contributions that make the work stand out.

context compression
explicit information transmission
large language models
soft compression
key-value cache optimization
πŸ”Ž Similar Papers
No similar papers found.