🤖 AI Summary
Whether combinatorial matrix multiplication can break the cubic time barrier—i.e., achieve $n^{3-delta}$—remains a fundamental open problem in algorithm design. This paper introduces the first convolution- and Fourier-based framework that reduces matrix multiplication to polynomial multiplication over the integers, bypassing the divide-and-conquer paradigm. Our key innovation is the first integration of CKSU polynomials with low-degree polynomial approximation, breaking the linear precision–speed trade-off inherent in approximate matrix multiplication (AMM); under distributional assumptions, our approximation error falls below that of rank-$r$ SVD. Technically, we combine FFT-accelerated multivariate convolution, the Fourier concentration lemma, and linear sketching. Experiments show: (i) an exact algorithm achieving $O(n^{2.89})$, and (ii) an approximate algorithm running in $O(rn^2)$ time with $O(r^{-1.1})$ error—significantly outperforming Krylov methods on Gaussian matrices.
📝 Abstract
A longstanding open question in algorithm design is whether "combinatorial" matrix multiplication algorithms -- avoiding Strassen-like divide-and-conquer -- can achieve truly subcubic runtime $n^{3-δ}$. We present an $O(n^{2.89})$-time exact algorithm, which only sums convolutions in $mathbb{Z}_m^k$ (multivariate polynomial multiplications) via FFT, building on the work of Cohn, Kleinberg, Szegedy and Umans (CKSU'05). While the algorithm avoids recursion, the asymptotic speedup arises only for impractically large matrices.
Motivated by practical applications, we use this baseline to develop a new framework for fast approximate matrix multiplication (AMM), via low-degree approximations of the CKSU polynomials. We show that combining the aforementioned algorithm with black-box linear sketching already breaks the longstanding linear speed-accuracy tradeoff for AMM (Sarlos'06, Clarkson-Woodruff'13 ,Pagh'11, Cohn-Lewis'00), achieving $frac{1}{r^{1.1}}|mathbf{A}|_F^2|mathbf{B}|_F^2$ error in $O(rn^2)$-time.
Our main result is a low-degree approximation scheme for the CKSU polynomials, based on a Fourier-concentration lemma, yielding substantially smaller error in the distributional setting where $mathbf{A},mathbf{B}$ come from an i.i.d product-distribution; For random Gaussian matrices, this practical AMM algorithm attains smaller error than the best rank-$r$ SVD of the output matrix $mathbf{A}mathbf{B}$, in time $O(rn^2)$. This is a substantial improvement over iterative Krylov subspace methods for low-rank approximation. Our theoretical and empirical results suggest the possibility of replacing MatMuls with sums of convolutions in LLM training and inference.