🤖 AI Summary
This work addresses the challenges of high computational cost and information redundancy that large language models face in long-context scenarios, which hinder their practical deployment. The authors propose a coarse-to-fine adaptive context compression framework that jointly optimizes semantic relevance and diversity to preserve critical information under high compression ratios. A key innovation is the introduction of a Marginal Information Gain (MIG) metric, enabling a two-stage mechanism: coarse-grained dynamic grouping followed by fine-grained token-level weighted fusion within groups. The approach is compatible with various mainstream large language model architectures. Experimental results demonstrate significant performance gains over existing methods across multiple question-answering and summarization benchmarks; for instance, on the Natural Questions dataset, it achieves a 25-point improvement in Exact Match score at a 32× compression ratio.
📝 Abstract
Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse tasks. However, their deployment in long context scenarios remains hindered by computational inefficiency and information redundancy. Context compression methods address these challenges by significantly reducing input length and eliminating redundancy. We propose COMI, a coarse-to-fine adaptive context compression framework that jointly optimizes for semantic relevance and diversity under high compression rates. We introduce Marginal Information Gain (MIG), a metric defined as the relevance of a unit to the input query minus its semantic redundancy with other units, guiding the compression process to prioritize information that is both relevant and low redundant. The framework operates in two stages: (1) Coarse-Grained Group Reallocation, where the context is partitioned into groups and dynamically assigned compression rates based on inter-group MIG, ensuring compression budgets align with information value distribution; and (2) Fine-Grained Token Merging, where tokens within each group are fused via an intra-group MIG-based weighting mechanism, thereby preserving key semantics while avoiding the accumulation of redundancy. Extensive experiments across question-answering (e.g., NaturalQuestions, 2WikiMQA, HotpotQA and NarrativeQA), summarization (e.g., MultiNews) with various backbones (e.g., LLaMA-2-7B, Qwen2-7B) show that COMI outperforms existing baselines by a large margin, e.g., approximately 25-point Exact Match (EM) improvement under 32x compression constraint with Qwen2-7B on NaturalQuestions.