Context Cascade Compression: Exploring the Upper Limits of Text Compression

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address computational and memory bottlenecks posed by million-token-long text inputs, this paper proposes C3, a cascaded compression framework: a small language model first compresses raw text into an extremely short sequence of latent tokens, which a large language model then accurately decodes to reconstruct semantic content. This two-stage collaborative architecture achieves, for the first time in a pure-text pipeline, lossless or near-lossless compression ratios of 20×–40×—surpassing the performance ceiling of traditional OCR-based approaches. Experiments show decoding accuracy of 98% at 20× compression and 93% at 40×, substantially outperforming DeepSeek-OCR (60%). C3 not only empirically validates the theoretical potential of text compression but also establishes a novel paradigm of cooperative compression between small and large language models, offering an efficient and practical pathway for long-context modeling.

Technology Category

Application Category

📝 Abstract

Million-level token inputs in long-context tasks pose significant computational and memory challenges for Large Language Models (LLMs). Recently, DeepSeek-OCR conducted research into the feasibility of Contexts Optical Compression and achieved preliminary results. Inspired by this, we introduce Context Cascade Compression C3 to explore the upper limits of text compression. Our method cascades two LLMs of different sizes to handle the compression and decoding tasks. Specifically, a small LLM, acting as the first stage, performs text compression by condensing a long context into a set of latent tokens (e.g., 32 or 64 in length), achieving a high ratio of text tokens to latent tokens. A large LLM, as the second stage, then executes the decoding task on this compressed context. Experiments show that at a 20x compression ratio (where the number of text tokens is 20 times the number of latent tokens), our model achieves 98% decoding accuracy, compared to approximately 60% for DeepSeek-OCR. When we further increase the compression ratio to 40x, the accuracy is maintained at around 93%. This indicates that in the domain of context compression, C3 Compression demonstrates superior performance and feasibility over optical character compression. C3 uses a simpler, pure-text pipeline that ignores factors like layout, color, and information loss from a visual encoder. This also suggests a potential upper bound for compression ratios in future work on optical character compression, OCR, and related fields. Codes and model weights are publicly accessible at https://github.com/liufanfanlff/C3-Context-Cascade-Compression

Problem

Research questions and friction points this paper is trying to address.

Compressing long text contexts to reduce computational demands

Achieving high accuracy decoding from highly compressed latent tokens

Exploring upper limits of compression ratios for optical character methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascades two LLMs of different sizes

Compresses long context into latent tokens

Achieves high accuracy at high compression ratios

🔎 Similar Papers

No similar papers found.