An Empirical Study on Prompt Compression for Large Language Models

📅 2025-04-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational overhead and deployment costs incurred by large language models (LLMs) when processing long prompts, this paper systematically evaluates six prompt compression methods across text and multimodal tasks, achieving simultaneous prompt length reduction, cost savings, and output quality preservation. Methodologically, we introduce the first comprehensive empirical analysis of compression’s impact on hallucination, token omission, and long-context performance—revealing that moderate compression can even improve LLM performance on Longbench long-context benchmarks (up to +2.3%). Our evaluation framework spans 13 heterogeneous datasets—including news, scientific, commonsense reasoning, mathematical, QA, and VQA domains—and incorporates multidimensional metrics: generation quality, hallucination rate, and cross-modal robustness. All code and datasets are publicly released to facilitate reproducibility and community-driven extensions.

Technology Category

Application Category

📝 Abstract
Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, we study six prompt compression methods for LLMs, aiming to reduce prompt length while maintaining LLM response quality. In this paper, we present a comprehensive analysis covering aspects such as generation performance, model hallucinations, efficacy in multimodal tasks, word omission analysis, and more. We evaluate these methods across 13 datasets, including news, scientific articles, commonsense QA, math QA, long-context QA, and VQA datasets. Our experiments reveal that prompt compression has a greater impact on LLM performance in long contexts compared to short ones. In the Longbench evaluation, moderate compression even enhances LLM performance. Our code and data is available at https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression.
Problem

Research questions and friction points this paper is trying to address.

Reducing prompt length for Large Language Models
Maintaining response quality after compression
Evaluating compression impact across diverse datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Studied six prompt compression methods for LLMs
Evaluated methods across 13 diverse datasets
Moderate compression enhances LLM performance
🔎 Similar Papers
No similar papers found.
Z
Zheng Zhang
The Hong Kong University of Science and Technology (Guangzhou)
J
Jinyi Li
South China University of Technology
Yihuai Lan
Yihuai Lan
Research Engineer@SMU
llm agent
X
Xiang Wang
University of Science and Technology of China
H
Hao Wang
The Hong Kong University of Science and Technology (Guangzhou)