🤖 AI Summary
To address the quadratic computational complexity of self-attention in large language models (LLMs), which impedes efficient long-context processing, and to overcome limitations of existing compression methods—namely poor semantic fidelity, positional bias, and inadequate long-range dependency modeling—this paper proposes an adaptive hierarchical compression framework. The method dynamically identifies information-dense regions to construct a semantic binary tree, where variable-length gist tokens serve as leaf nodes, and introduces a lightweight hierarchical aggregation mechanism. Critically, it preserves both fine-grained local details and global semantic consistency without modifying the frozen backbone model. Compared to state-of-the-art explicit and implicit compression approaches, our framework substantially reduces computational overhead while improving inference efficiency and context fidelity across multiple long-text benchmarks; the additional parameter count is negligible.
📝 Abstract
The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.