Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

📅 2025-01-04

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

High-order tensor-weighted neural networks—such as the Fourier Neural Operator (FNO)—suffer from explosive memory consumption and inefficient training in scientific computing due to their high-dimensional parameter spaces. Method: This paper proposes an embedded gradient tensor decomposition optimization framework that, for the first time, integrates Tucker and CP decompositions directly into the optimization process. It performs low-rank gradient approximation and structure-preserving low-rank updates entirely within the tensor space, with theoretical convergence guarantees. The method requires no architectural modifications and is fully compatible with mainstream FNO variants. Results: Evaluated on PDE-solving tasks—including Navier–Stokes and Darcy Flow—the framework reduces GPU memory usage by up to 75% while preserving full accuracy. It significantly enhances scalability and training efficiency of high-fidelity scientific AI models without compromising solution quality.

Technology Category

Application Category

📝 Abstract

We present Tensor-GaLore, a novel method for efficient training of neural networks with higher-order tensor weights. Many models, particularly those used in scientific computing, employ tensor-parameterized layers to capture complex, multidimensional relationships. When scaling these methods to high-resolution problems makes memory usage grow intractably, and matrix based optimization methods lead to suboptimal performance and compression. We propose to work directly in the high-order space of the complex tensor parameter space using a tensor factorization of the gradients during optimization. We showcase its effectiveness on Fourier Neural Operators (FNOs), a class of models crucial for solving partial differential equations (PDE) and prove the theory of it. Across various PDE tasks like the Navier Stokes and Darcy Flow equations, Tensor-GaLore achieves substantial memory savings, reducing optimizer memory usage by up to 75%. These substantial memory savings across AI for science demonstrate Tensor-GaLore's potential.

Problem

Research questions and friction points this paper is trying to address.

High-Precision Scientific Computing

Complex Spatial Relationships

Memory Demand

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tensor-GaLore

Gradient Decomposition

Memory Efficiency

🔎 Similar Papers

No similar papers found.