When Less is More: The LLM Scaling Paradox in Context Compression

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the counterintuitive “scale-fidelity paradox” in context compression, where increasing language model size leads to degraded reconstruction fidelity. Through systematic experiments across compressor-decoder architectures using models ranging from 0.6B to 90B parameters, the authors uncover a negative correlation between model scale and contextual faithfulness. They propose two underlying mechanisms—“knowledge coverage” and “semantic drift”—attributing this phenomenon to excessive semantic capacity and amplified generation uncertainty. These claims are substantiated via representational analyses, including context embedding rank and entropy of prediction distributions. Furthermore, the work reveals emergent properties in compressed representations, demonstrating that conventional scaling laws fail in tasks requiring faithful preservation of open-ended generative contexts.

Technology Category

Application Category

📝 Abstract
Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compression in a compressor-decoder setup, we observe a Size-Fidelity Paradox: increasing the compressor size can lessen the faithfulness of reconstructed contexts though training loss decreases. Through extensive experiments across models from 0.6B to 90B, we coin this paradox arising from two dominant factors: 1) knowledge overwriting: larger models increasingly replace source facts with their own prior beliefs, e.g., ``the white strawberry'' $\to$ ``the red strawberry''; and 2) semantic drift: larger models tend to paraphrase or restructure content instead of reproducing it verbatim, e.g., ``Alice hit Bob'' $\to$ ``Bob hit Alice''. By holding model size fixed, we reflect on the emergent properties of compressed context representations. We show that the culprit is not parameter count itself, but the excessive semantic capacity and amplified generative uncertainty that accompany scaling. Specifically, the increased rank of context embeddings facilitates prior knowledge intrusion, whereas higher entropy over token prediction distributions promotes rewriting. Our results complement existing evaluations over context compression paradigm, underpinning a breakdown in scaling laws for faithful preservation in open-ended generation.
Problem

Research questions and friction points this paper is trying to address.

context compression
scaling paradox
faithfulness
semantic drift
knowledge overwriting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Size-Fidelity Paradox
context compression
knowledge overwriting
semantic drift
scaling laws
🔎 Similar Papers
No similar papers found.
R
Ruishan Guo
Baidu Inc., Beijing, China; Shenzhen International Graduate School, Tsinghua University
Yibing Liu
Yibing Liu
Researcher, Baidu
LLM AccelerationTrustworthy ML
G
Guoxin Ma
Baidu Inc., Beijing, China; Xi’an Jiao Tong University
Y
Yan Wang
Baidu Inc., Beijing, China
Y
Yueyang Zhang
Baidu Inc., Beijing, China
Long Xia
Long Xia
Research Scientist, Baidu
information retrievaldata miningapplied machine learningrecommender system
Kecheng Chen
Kecheng Chen
PhD student at EE, City University of Hong Kong
Transfer LearningAI for HealthcareSignal Processing
Z
Zhiyuan Sun
Baidu Inc., Beijing, China
D
Daiting Shi
Baidu Inc., Beijing, China