🤖 AI Summary
This work addresses the challenges of high computational overhead and inefficient processing of ultra-long inputs in large language models with extended context windows. Existing context compression methods struggle to simultaneously preserve task relevance, topical coverage, and cross-sentence coherence under strict token budgets. To overcome this, the authors propose a training-free, model-agnostic compression framework that introduces a structured graph prior: a sparse hybrid sentence graph integrating semantic k-NN relations and local sequential dependencies. By applying graph clustering to extract a thematic skeleton, the method selects sentences greedily under a token budget using interpretable, multi-dimensional scores—encompassing task relevance, cluster representativeness, bridging centrality, and cycle coverage—while preserving the original sentence order. Experiments across four datasets demonstrate that the approach matches or surpasses strong baselines, particularly excelling in long-document tasks, thereby validating its effectiveness and generalization capability.
📝 Abstract
Long-context large language models remain computationally expensive to run and often fail to reliably process very long inputs, which makes context compression an important component of many systems. Existing compression approaches typically rely on trained compressors, dense retrieval-style selection, or heuristic trimming, and they often struggle to jointly preserve task relevance, topic coverage, and cross-sentence coherence under a strict token budget. To address this, we propose a training-free and model-agnostic compression framework that selects a compact set of sentences guided by structural graph priors. Our method constructs a sparse hybrid sentence graph that combines mutual k-NN semantic edges with short-range sequential edges, extracts a topic skeleton via clustering, and ranks sentences using an interpretable score that integrates task relevance, cluster representativeness, bridge centrality, and a cycle coverage cue. A budgeted greedy selection with redundancy suppression then produces a readable compressed context in original order. Experimental results on four datasets show that our approach is competitive with strong extractive and abstractive baselines, demonstrating larger gains on long-document benchmarks.