WaterSIC: information-theoretically (near) optimal linear layer quantization

πŸ“… 2026-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the rate-distortion trade-off between compression ratio and output distortion in low-precision quantization of dense linear layers. Drawing on information-theoretic analysis, the authors propose a novel quantization algorithm that allocates non-uniform bit rates across different columns of the weight matrix, emulating a β€œwater-filling” strategy to approach the information-theoretic limit. Integrated within an optimization framework inspired by GPTQ, the method achieves state-of-the-art performance on Llama and Qwen large language models under 1–4 bit quantization, with the achieved quantization rate deviating from the theoretical limit by no more than 0.255 bits.

Technology Category

Application Category

πŸ“ Abstract
This paper considers the problem of converting a given dense linear layer to low precision. The tradeoff between compressed length and output discrepancy is analyzed information theoretically (IT). It is shown that a popular GPTQ algorithm may have an arbitrarily large gap to the IT limit. To alleviate this problem, a novel algorithm, termed''WaterSIC'', is proposed and is shown to be within a rate gap of 0.255 bits to the IT limit, uniformly over all possible covariance matrices of input activations. The key innovation of WaterSIC's is to allocate different quantization rates to different columns (in-features) of the weight matrix, mimicking the classical IT solution known as''waterfilling''. Applying WaterSIC to the Llama and Qwen family of LLMs establishes new state-of-the-art performance for all quantization rates from 1 to 4 bits.
Problem

Research questions and friction points this paper is trying to address.

linear layer quantization
information-theoretic limit
low-precision compression
output discrepancy
quantization rate allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

WaterSIC
linear layer quantization
information-theoretic optimization
waterfilling
low-bit LLMs
πŸ”Ž Similar Papers
No similar papers found.
E
Egor Lifar
MIT
S
Semyon Savkin
Independent Researcher
Or Ordentlich
Or Ordentlich
Hebrew University of Jerusalem
Y
Yury Polyanskiy
MIT