Context Cascade Compression: Exploring the Upper Limits of Text Compression

πŸ“… 2025-11-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address computational and memory bottlenecks posed by million-token-long text inputs, this paper proposes C3, a cascaded compression framework: a small language model first compresses raw text into an extremely short sequence of latent tokens, which a large language model then accurately decodes to reconstruct semantic content. This two-stage collaborative architecture achieves, for the first time in a pure-text pipeline, lossless or near-lossless compression ratios of 20×–40Γ—β€”surpassing the performance ceiling of traditional OCR-based approaches. Experiments show decoding accuracy of 98% at 20Γ— compression and 93% at 40Γ—, substantially outperforming DeepSeek-OCR (60%). C3 not only empirically validates the theoretical potential of text compression but also establishes a novel paradigm of cooperative compression between small and large language models, offering an efficient and practical pathway for long-context modeling.

Technology Category

Application Category

πŸ“ Abstract
Million-level token inputs in long-context tasks pose significant computational and memory challenges for Large Language Models (LLMs). Recently, DeepSeek-OCR conducted research into the feasibility of Contexts Optical Compression and achieved preliminary results. Inspired by this, we introduce Context Cascade Compression C3 to explore the upper limits of text compression. Our method cascades two LLMs of different sizes to handle the compression and decoding tasks. Specifically, a small LLM, acting as the first stage, performs text compression by condensing a long context into a set of latent tokens (e.g., 32 or 64 in length), achieving a high ratio of text tokens to latent tokens. A large LLM, as the second stage, then executes the decoding task on this compressed context. Experiments show that at a 20x compression ratio (where the number of text tokens is 20 times the number of latent tokens), our model achieves 98% decoding accuracy, compared to approximately 60% for DeepSeek-OCR. When we further increase the compression ratio to 40x, the accuracy is maintained at around 93%. This indicates that in the domain of context compression, C3 Compression demonstrates superior performance and feasibility over optical character compression. C3 uses a simpler, pure-text pipeline that ignores factors like layout, color, and information loss from a visual encoder. This also suggests a potential upper bound for compression ratios in future work on optical character compression, OCR, and related fields. Codes and model weights are publicly accessible at https://github.com/liufanfanlff/C3-Context-Cascade-Compression
Problem

Research questions and friction points this paper is trying to address.

Compressing long text contexts to reduce computational demands
Achieving high accuracy decoding from highly compressed latent tokens
Exploring upper limits of compression ratios for optical character methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascades two LLMs of different sizes
Compresses long context into latent tokens
Achieves high accuracy at high compression ratios
πŸ”Ž Similar Papers
No similar papers found.