Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

The explosive growth of text generated by large language models (LLMs) poses significant challenges for storage and management. Method: This paper introduces the “LLM self-compression” paradigm, leveraging LLMs’ inherent predictability over their own outputs to transform their forward inference interface directly into a lossless compressor. Specifically, we design an arithmetic coding scheme grounded in next-token probability distributions, integrating entropy estimation and sequence modeling optimization for end-to-end compression. Contribution/Results: To our knowledge, this is the first systematic proposal and empirical validation of a “model-as-compressor” architecture. Evaluated across 14 mainstream LLMs and 8 diverse generation datasets, our approach achieves an average compression ratio exceeding 20×—more than three times higher than gzip—while demonstrating strong robustness across model scales and domain-specific tasks. This work establishes a novel, LLM-native paradigm for efficient generative data management.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) continue to be deployed and utilized across domains, the volume of LLM-generated data is growing rapidly. This trend highlights the increasing importance of effective and lossless compression for such data in modern text management systems. However, compressing LLM-generated data presents unique challenges compared to traditional human- or machine-generated content. Traditional machine-generated data is typically derived from computational processes or device outputs, often highly structured and limited to low-level elements like labels or numerical values. This structure enables conventional lossless compressors to perform efficiently. In contrast, LLM-generated data is more complex and diverse, requiring new approaches for effective compression. In this work, we conduct the first systematic investigation of lossless compression techniques tailored specifically to LLM-generated data. Notably, because LLMs are trained via next-token prediction, we find that LLM-generated data is highly predictable for the models themselves. This predictability enables LLMs to serve as efficient compressors of their own outputs. Through extensive experiments with 14 representative LLMs and 8 LLM-generated datasets from diverse domains, we show that LLM-based prediction methods achieve remarkable compression rates, exceeding 20x, far surpassing the 3x rate achieved by Gzip, a widely used general-purpose compressor. Furthermore, this advantage holds across different LLM sizes and dataset types, demonstrating the robustness and practicality of LLM-based methods in lossless text compression under generative AI workloads.

Problem

Research questions and friction points this paper is trying to address.

Lossless compression of complex LLM-generated text

Overcoming inefficiency of traditional compressors for LLM outputs

Leveraging LLMs' self-predictive power for high-ratio compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs compress own outputs via next-token prediction

LLM-based methods achieve over 20x compression rates

Outperform Gzip with robust cross-model performance

🔎 Similar Papers

Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models