Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates independent scalar quantization of two matrices prior to multiplication, aiming to minimize the mean squared error (MSE) of their quantized product. Leveraging high-resolution quantization theory, the authors derive an asymptotic expansion of the MSE under the Frobenius norm by integrating conditional second-moment analysis with probabilistic modeling. They present, for the first time, a closed-form expression for the optimal quantization point density in the case of Gaussian-correlated matrix entries. This solution reveals a unimodal–bimodal phase transition governed by the correlation coefficient. Experimental results demonstrate the effectiveness of the proposed approach in quantized matrix multiplication, least-squares optimization, and activation quantization of keys and queries in large language models.

Technology Category

Application Category

📝 Abstract
We study entrywise scalar quantization of two matrices prior to multiplication. Given $A\in R^{m\times k}$ and $B\in R^{k\times n}$, we quantize entries of $A$ and $B$ independently using scalar quantizers with $K_X$ and $K_Y$ levels per entry, and form $\widehat C=\widehat A\,\widehat B$. The objective is to minimize the matrix multiplication mean-squared error (MSE) $E[\|{AB-\widehat A\widehat B}\|_F^2]$ under a pair-i.i.d.\ inner-product model. In the high-resolution regime $K_X,K_Y\to\infty$, we derive a sharp $K^{-2}$ asymptotic expansion for $\mathcal{E}$, identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density \[ λ^\star(u)\ \propto\ \exp\!\left(-\frac{u^2}{6}\right)\bigl((1-ρ^2)+ρ^2u^2\bigr)^{1/3}, \qquad u=\frac{x}{σ_X}, \] with the same form for $y/σ_Y$, and prove a correlation-driven phase transition: the density is unimodal at the origin for $|ρ|\leq 1/\sqrt{3}$ and becomes bimodal for $|ρ|>1/\sqrt{3}$ with peaks at $u_{\mathrm{peak}}=\pm\sqrt{3-1/ρ^2}$. We show our method's applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.
Problem

Research questions and friction points this paper is trying to address.

scalar quantization
matrix multiplication
mean-squared error
optimal quantization
phase transition
Innovation

Methods, ideas, or system contributions that make the work stand out.

scalar quantization
matrix multiplication
asymptotic analysis
optimal point density
phase transition
🔎 Similar Papers
No similar papers found.