From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors

📅 2026-04-25

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenges of high computational overhead and inefficient processing of ultra-long inputs in large language models with extended context windows. Existing context compression methods struggle to simultaneously preserve task relevance, topical coverage, and cross-sentence coherence under strict token budgets. To overcome this, the authors propose a training-free, model-agnostic compression framework that introduces a structured graph prior: a sparse hybrid sentence graph integrating semantic k-NN relations and local sequential dependencies. By applying graph clustering to extract a thematic skeleton, the method selects sentences greedily under a token budget using interpretable, multi-dimensional scores—encompassing task relevance, cluster representativeness, bridging centrality, and cycle coverage—while preserving the original sentence order. Experiments across four datasets demonstrate that the approach matches or surpasses strong baselines, particularly excelling in long-document tasks, thereby validating its effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

Long-context large language models remain computationally expensive to run and often fail to reliably process very long inputs, which makes context compression an important component of many systems. Existing compression approaches typically rely on trained compressors, dense retrieval-style selection, or heuristic trimming, and they often struggle to jointly preserve task relevance, topic coverage, and cross-sentence coherence under a strict token budget. To address this, we propose a training-free and model-agnostic compression framework that selects a compact set of sentences guided by structural graph priors. Our method constructs a sparse hybrid sentence graph that combines mutual k-NN semantic edges with short-range sequential edges, extracts a topic skeleton via clustering, and ranks sentences using an interpretable score that integrates task relevance, cluster representativeness, bridge centrality, and a cycle coverage cue. A budgeted greedy selection with redundancy suppression then produces a readable compressed context in original order. Experimental results on four datasets show that our approach is competitive with strong extractive and abstractive baselines, demonstrating larger gains on long-document benchmarks.

Problem

Research questions and friction points this paper is trying to address.

context compression

long-context LLMs

token budget

coherence preservation

topic coverage

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free compression

hybrid graph priors

sentence graph