Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the lack of theoretical understanding regarding the loss of expressive power in quantized Transformers when reduced numerical precision is employed to accelerate inference. By constructing a target function Γ inspired by equality functions and combining an explicit construction of finite-precision Transformers with communication complexity lower bounds, the study establishes—for the first time—a sharp theoretical threshold: there exists a critical precision level p such that Γ can be exactly realized with p bits per parameter, but not with p−1 bits. This single-bit critical phenomenon precisely characterizes the fine-grained trade-off between numerical precision and representational capacity, offering a rigorous theoretical foundation for task-aware quantization strategies.

Technology Category

Application Category

📝 Abstract

Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-grained theoretical tradeoff between expressivity and precision: For every p we exhibit a function {\Gamma}, inspired by the equality function, and prove that a one-layer softmax Transformer can compute {\Gamma}, with p bits of precision, but not with p-1 bits of precision. This result concretely explains the widely observed phenomenon of empirical loss of expressivity when quantization is used. Practically, it suggests that tasks requiring equality-like comparisons (exact match, membership, etc.) are especially sensitive to quantization. Dropping even one bit can cross a threshold where the model cannot represent the needed comparison reliably. Thus, it paves the way for developing heuristics that will help practitioners choose how much quantization is possible: the precision should be chosen as a function of the length of equality to be checked for the specific task. Our proofs combine explicit finite-precision Transformer constructions with communication-complexity lower bounds, yielding a tight"one-bit"threshold.

Problem

Research questions and friction points this paper is trying to address.

quantization

expressivity

precision

Transformers

equality function

Innovation

Methods, ideas, or system contributions that make the work stand out.

quantized Transformers

precision-expressivity tradeoff

theoretical analysis