An Empirical Study on Prompt Compression for Large Language Models

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the high computational overhead and deployment costs incurred by large language models (LLMs) when processing long prompts, this paper systematically evaluates six prompt compression methods across text and multimodal tasks, achieving simultaneous prompt length reduction, cost savings, and output quality preservation. Methodologically, we introduce the first comprehensive empirical analysis of compression’s impact on hallucination, token omission, and long-context performance—revealing that moderate compression can even improve LLM performance on Longbench long-context benchmarks (up to +2.3%). Our evaluation framework spans 13 heterogeneous datasets—including news, scientific, commonsense reasoning, mathematical, QA, and VQA domains—and incorporates multidimensional metrics: generation quality, hallucination rate, and cross-modal robustness. All code and datasets are publicly released to facilitate reproducibility and community-driven extensions.

Technology Category

Application Category

📝 Abstract

Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, we study six prompt compression methods for LLMs, aiming to reduce prompt length while maintaining LLM response quality. In this paper, we present a comprehensive analysis covering aspects such as generation performance, model hallucinations, efficacy in multimodal tasks, word omission analysis, and more. We evaluate these methods across 13 datasets, including news, scientific articles, commonsense QA, math QA, long-context QA, and VQA datasets. Our experiments reveal that prompt compression has a greater impact on LLM performance in long contexts compared to short ones. In the Longbench evaluation, moderate compression even enhances LLM performance. Our code and data is available at https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression.

Problem

Research questions and friction points this paper is trying to address.

Reducing prompt length for Large Language Models

Maintaining response quality after compression

Evaluating compression impact across diverse datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Studied six prompt compression methods for LLMs

Evaluated methods across 13 diverse datasets

Moderate compression enhances LLM performance

🔎 Similar Papers

From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression