I3S: Importance Sampling Subspace Selection for Low-Rank Optimization in LLM Pretraining

๐Ÿ“… 2025-02-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Low-rank optimization in large language model (LLM) pretraining suffers from gradient update constraints and subspace stagnation due to reliance on static dominant subspaces. Method: We propose a novel dynamic low-rank subspace selection paradigm based on importance samplingโ€”unlike conventional approaches that retain only the top-gradient singular components, our method dynamically samples the update subspace according to the importance probability distribution over gradient directions, marking the first integration of importance sampling into low-rank subspace construction. Contribution/Results: We provide theoretical guarantees showing convergence no worse than dominant-subspace methods. Empirically, our approach significantly improves training stability and final model performance while preserving memory efficiency, effectively mitigating subspace rigidity during pretraining.

Technology Category

Application Category

๐Ÿ“ Abstract
Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memory cost of storing optimizer states. A key challenge in these methods is identifying suitable subspaces to ensure an effective optimization trajectory. Most existing approaches select the dominant subspace to preserve gradient information, as this intuitively provides the best approximation. However, we find that in practice, the dominant subspace stops changing during pretraining, thereby constraining weight updates to similar subspaces. In this paper, we propose importance sampling subspace selection (I3S) for low-rank optimization, which theoretically offers a comparable convergence rate to the dominant subspace approach. Empirically, we demonstrate that I3S significantly outperforms previous methods in LLM pretraining tasks.
Problem

Research questions and friction points this paper is trying to address.

Optimizes memory-efficient LLM training
Selects effective low-rank subspaces
Improves LLM pretraining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Importance sampling subspace selection
Low-rank optimization technique
Improved LLM pretraining efficiency
๐Ÿ”Ž Similar Papers
No similar papers found.