🤖 AI Summary
This work addresses the challenge that conventional optimizers struggle to preserve the spectral properties of weight matrices during large language model training, often leading to instability or geometric distortion. To overcome this limitation, the authors propose Pion, a novel optimizer that, for the first time, incorporates left-right orthogonal equivalence transformations into the optimization process. By employing a non-additive update mechanism, Pion strictly preserves the singular values of weight matrices, thereby fixing their spectral norm and controlling their geometric structure. Theoretical analysis establishes the convergence of the proposed method. Empirical results demonstrate that Pion achieves stability and performance on par with or superior to mainstream optimizers in both pretraining and fine-tuning of large language models, marking a departure from traditional additive optimization paradigms.
📝 Abstract
We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout training. This yields an optimization mechanism that modulates the geometry of weight matrices while keeping their spectral norm fixed. We derive the Pion update rule, systematically examine its design choices, and analyze its convergence behavior along with several key properties. Empirical results show that Pion offers a stable and competitive alternative to standard optimizers for both LLM pretraining and finetuning.