Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and substantial memory consumption of Shampoo-type optimizers when employing large preconditioning matrices, particularly under BFloat16 low-precision storage, which often degrades performance. The authors propose a general subspace basis update mechanism that reparameterizes the preconditioner to update only a subset of basis vectors within a low-dimensional subspace, while reconstructing the full basis using fixed components. This approach significantly reduces the cost of QR decomposition. Compatible with various Shampoo variants, the method markedly lowers both memory usage and computational overhead under BFloat16 precision without compromising—and in some cases even improving—optimization performance. Experiments demonstrate notable efficiency gains for SOAP and KL-SOAP within this framework, with KL-SOAP achieving convergence comparable to or better than KL-Shampoo.
📝 Abstract
Shampoo-based methods, such as KL-Shampoo and SOAP, have demonstrated strong performance in training neural networks and rely on QR decomposition. Because existing QR implementations require single-precision (FP32) arithmetic and remain computationally expensive, these methods become time- and memory-intensive when their preconditioning matrices are large. Moreover, using BFloat16 (BFP16) storage to reduce memory usage can degrade the performance of Shampoo-based methods. We propose a reparametrization of the preconditioner that supports BFP16 storage and forms a complete basis by combining updated basis vectors with unchanged ones. By updating only part of the basis through QR decomposition in a subspace, our approach reduces computational overhead while mitigating the performance degradation caused by BFP16 storage. Our approach applies broadly to Shampoo-based methods that employ QR decomposition, including KL-Shampoo, SOAP, and KL-SOAP. In particular, it improves the performance of SOAP and KL-SOAP under BFP16 storage, enabling KL-SOAP to match or exceed KL-Shampoo. Overall, our approach makes Shampoo-based methods more memory- and time-efficient.
Problem

Research questions and friction points this paper is trying to address.

Shampoo
QR decomposition
BFloat16
preconditioning
memory efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shampoo
BFloat16
QR decomposition
subspace basis update
preconditioning
🔎 Similar Papers