🤖 AI Summary
This work addresses the lack of theoretical understanding regarding the loss of expressive power in quantized Transformers when reduced numerical precision is employed to accelerate inference. By constructing a target function Γ inspired by equality functions and combining an explicit construction of finite-precision Transformers with communication complexity lower bounds, the study establishes—for the first time—a sharp theoretical threshold: there exists a critical precision level p such that Γ can be exactly realized with p bits per parameter, but not with p−1 bits. This single-bit critical phenomenon precisely characterizes the fine-grained trade-off between numerical precision and representational capacity, offering a rigorous theoretical foundation for task-aware quantization strategies.
📝 Abstract
Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-grained theoretical tradeoff between expressivity and precision: For every p we exhibit a function {\Gamma}, inspired by the equality function, and prove that a one-layer softmax Transformer can compute {\Gamma}, with p bits of precision, but not with p-1 bits of precision. This result concretely explains the widely observed phenomenon of empirical loss of expressivity when quantization is used. Practically, it suggests that tasks requiring equality-like comparisons (exact match, membership, etc.) are especially sensitive to quantization. Dropping even one bit can cross a threshold where the model cannot represent the needed comparison reliably. Thus, it paves the way for developing heuristics that will help practitioners choose how much quantization is possible: the precision should be chosen as a function of the length of equality to be checked for the specific task. Our proofs combine explicit finite-precision Transformer constructions with communication-complexity lower bounds, yielding a tight"one-bit"threshold.