(Approximate) Matrix Multiplication via Convolutions

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Whether combinatorial matrix multiplication can break the cubic time barrier—i.e., achieve $n^{3-delta}$—remains a fundamental open problem in algorithm design. This paper introduces the first convolution- and Fourier-based framework that reduces matrix multiplication to polynomial multiplication over the integers, bypassing the divide-and-conquer paradigm. Our key innovation is the first integration of CKSU polynomials with low-degree polynomial approximation, breaking the linear precision–speed trade-off inherent in approximate matrix multiplication (AMM); under distributional assumptions, our approximation error falls below that of rank-$r$ SVD. Technically, we combine FFT-accelerated multivariate convolution, the Fourier concentration lemma, and linear sketching. Experiments show: (i) an exact algorithm achieving $O(n^{2.89})$, and (ii) an approximate algorithm running in $O(rn^2)$ time with $O(r^{-1.1})$ error—significantly outperforming Krylov methods on Gaussian matrices.

Technology Category

Application Category

📝 Abstract

A longstanding open question in algorithm design is whether "combinatorial" matrix multiplication algorithms -- avoiding Strassen-like divide-and-conquer -- can achieve truly subcubic runtime $n^{3-δ}$. We present an $O(n^{2.89})$-time exact algorithm, which only sums convolutions in $mathbb{Z}_m^k$ (multivariate polynomial multiplications) via FFT, building on the work of Cohn, Kleinberg, Szegedy and Umans (CKSU'05). While the algorithm avoids recursion, the asymptotic speedup arises only for impractically large matrices. Motivated by practical applications, we use this baseline to develop a new framework for fast approximate matrix multiplication (AMM), via low-degree approximations of the CKSU polynomials. We show that combining the aforementioned algorithm with black-box linear sketching already breaks the longstanding linear speed-accuracy tradeoff for AMM (Sarlos'06, Clarkson-Woodruff'13 ,Pagh'11, Cohn-Lewis'00), achieving $frac{1}{r^{1.1}}|mathbf{A}|_F^2|mathbf{B}|_F^2$ error in $O(rn^2)$-time. Our main result is a low-degree approximation scheme for the CKSU polynomials, based on a Fourier-concentration lemma, yielding substantially smaller error in the distributional setting where $mathbf{A},mathbf{B}$ come from an i.i.d product-distribution; For random Gaussian matrices, this practical AMM algorithm attains smaller error than the best rank-$r$ SVD of the output matrix $mathbf{A}mathbf{B}$, in time $O(rn^2)$. This is a substantial improvement over iterative Krylov subspace methods for low-rank approximation. Our theoretical and empirical results suggest the possibility of replacing MatMuls with sums of convolutions in LLM training and inference.

Problem

Research questions and friction points this paper is trying to address.

Develop exact matrix multiplication avoiding Strassen recursion using convolutions

Break linear speed-accuracy tradeoff for approximate matrix multiplication

Enable practical convolution-based matrix multiplication for large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact matrix multiplication using convolutions via FFT

Approximate multiplication via low-degree CKSU polynomial approximations

Practical AMM with reduced error for distributional input matrices

🔎 Similar Papers

No similar papers found.