🤖 AI Summary
Existing prompt compression methods rely on task-specific questions or handcrafted templates, limiting their generalizability. This paper proposes Task-Agnostic Prompt Compression (TPC), a framework that achieves cross-task and cross-domain compression without requiring any task priors—such as question inputs or predefined templates. Our approach features three key innovations: (1) multi-granularity importance scoring based on context-aware sentence embeddings; (2) a reinforcement learning–driven task descriptor that jointly models contextual relevance and generates adaptive compression policies in an end-to-end manner; and (3) native support for multi-scale model deployment. Experiments on LongBench and ZeroSCROLLS demonstrate that TPC surpasses state-of-the-art methods: it delivers superior performance for large models while significantly reducing the parameter count of base models without sacrificing accuracy—achieving an optimal trade-off between efficiency and generalization.
📝 Abstract
The rise of Large Language Models (LLMs) has led to significant interest in prompt compression, a technique aimed at reducing the length of input prompts while preserving critical information. However, the prominent approaches in prompt compression often require explicit questions or handcrafted templates for compression, limiting their generalizability. We propose Task-agnostic Prompt Compression (TPC), a novel framework that generalizes compression across tasks and domains without requiring input questions or templates. TPC generates a context-relevant task description using a task descriptor trained on a curated dataset of context and query pairs, and fine-tuned via reinforcement learning with a reward function designed to capture the most relevant information. The task descriptor is then utilized to compute the relevance of each sentence in the prompt to generate the compressed prompt. We introduce 3 model sizes (Base, Large, and Huge), where the largest model outperforms the existing state-of-the-art methods on LongBench and ZeroSCROLLS benchmarks, and our smallest model performs comparable to the existing solutions while being considerably smaller.