The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Spectral norm constraints in large language model (LLM) weight matrix optimization are overly restrictive, limiting generalization and adaptability. Method: We propose Fanions—a novel family of optimizers leveraging the duality between Ky Fan $k$-norms and convex combinations of Frobenius and $ell_infty$ norms. We instantiate two variants: F-Fanions and S-Fanions, yielding concrete algorithms F-Muon and S-Muon. Our approach integrates matrix dual norm theory, convex norm composition, and Muon-style adaptive updates. Contribution/Results: We theoretically establish the equivalence of Fanions to Dion and generalize the Muon framework to broader matrix norm structures. Empirically, F-Muon and S-Muon match Muon’s performance across multiple LLM training tasks; in synthetic linear least-squares benchmarks, they significantly outperform the original Muon—demonstrating both the efficacy and improved generalization of our norm design.

Technology Category

Application Category

📝 Abstract
In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan $k$-norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan $k$-norms with either the Frobenius norm or the $l_infty$ norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.
Problem

Research questions and friction points this paper is trying to address.

Optimizing weight matrices in large language models
Introducing Fanions using duals of Ky Fan k-norms
Improving performance on synthetic linear least squares
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using duals of Ky Fan k-norms for optimization
Introducing Fanions family with F-Fanions and S-Fanions
Combining Ky Fan norms with Frobenius or l∞ norms
🔎 Similar Papers
No similar papers found.
A
Alexey Kravatskiy
MIPT
I
Ivan Kozyrev
MIPT, INM RAS
N
Nikolai Kozlov
MIPT
Alexander Vinogradov
Alexander Vinogradov
MIPT
D
Daniil Merkulov
MIPT, Skoltech, HSE, AI4Science
Ivan Oseledets
Ivan Oseledets
AIRI; Skolkovo Institute of Science and Technology
Numerical mathematicstensorsdeep learningmachine learningmatrix analysis