🤖 AI Summary
This work addresses the high computational cost and information redundancy inherent in long-context processing with large language models, as well as the instability and semantic fragmentation caused by positional bias in existing soft prompt compression methods. The authors propose Semantic-consistent Context Compression (SeCo), a novel approach that abandons position-driven paradigms and instead performs fully semantics-based dynamic compression. SeCo selects query-relevant tokens as semantic anchors and aggregates remaining tokens through consistency-weighted fusion, preserving semantic coherence without relying on fixed positions or physical layout. Extensive experiments demonstrate that SeCo consistently outperforms prior methods across two backbone models and fourteen benchmarks, achieving significant improvements in downstream task performance, inference latency, and out-of-domain robustness.
📝 Abstract
Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks. However, their deployment in long-context scenarios faces high computational overhead and information redundancy. While soft prompt compression has emerged as a promising way to mitigate these costs by compressing sequences into compact embeddings, existing paradigms remain fundamentally constrained by position bias: they primarily rely on learnable tokens insertion at fixed positions or group tokens according to their physical token layout, thereby inducing performance instability and semantic fragmentation. To overcome this bottleneck, we propose Semantic Consistency Context Compression (SeCo), a method that shifts context compression from position-driven to semantic-driven. Rather than constraint by physical token layout, SeCo dynamically anchors compression directly in the semantic space by selecting query-relevant tokens as semantic centers and aggregating remaining tokens via consistency-weighted merging. This design inherently preserves semantic consistency while eliminating position bias. Extensive experiments on 14 benchmarks across two backbone models demonstrate that SeCo consistently shows superiority in downstream tasks, inference latency, and out-of-domain robustness. The code is available at https://anonymous.4open.science/r/seco-EE5E.