🤖 AI Summary
This work addresses the limitation of existing soft context compression methods that employ a uniform compression ratio, which fails to accommodate the substantial variation in information density across natural language. To overcome this, we propose a semi-dynamic context compression framework featuring a discrete ratio selector grounded in local information density. This selector dynamically predicts and quantizes the compression ratio, and is jointly trained with a mean-pooling-based backbone compressor on synthetic data, thereby effectively avoiding the training instability associated with continuous hyperparameters. Empirical evaluations demonstrate that our approach consistently outperforms static compression baselines across multiple benchmarks, establishing a robust Pareto frontier for context compression techniques.
📝 Abstract
Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress