Reparameterized LLM Training via Orthogonal Equivalence Transformation

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Training large language models (LLMs) suffers from instability and poor convergence. To address this, we propose POET—a novel reparameterization method that introduces orthogonal equivalence transformations into LLM optimization for the first time. POET represents each weight matrix as the product of two learnable orthogonal matrices and a fixed random matrix, thereby strictly preserving the singular spectrum of the original weights. This design simultaneously enhances optimization stability and generalization performance. To enable scalability, we further develop an efficient approximation framework incorporating orthogonal optimization, frozen random weights, and low-rank gradient reconstruction. Extensive experiments demonstrate that POET significantly accelerates convergence and improves final model performance across diverse LLM training tasks, while markedly increasing training robustness. Notably, POET successfully enables efficient and stable training of billion-parameter-scale models.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

Problem

Research questions and friction points this paper is trying to address.

Effectively and reliably training large language models (LLMs)

Optimizing neurons via orthogonal equivalence transformation

Improving generalization and scalability in LLM training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reparameterizes neurons with orthogonal matrices

Preserves spectral properties for stable optimization

Efficient approximations enable large-scale training

🔎 Similar Papers

No similar papers found.