DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structured approximations of the Fisher Information Matrix (FIM) in large-scale model training face a fundamental trade-off between computational efficiency and approximation accuracy. Method: We propose Dynamic Kronecker Adaptive Factorization (DyKAF), the first method to incorporate projection-splitting dynamics into Kronecker-factorized FIM approximation. DyKAF performs dynamic factor updates and employs a projection-splitting integrator—operating directly in matrix space—to learn preconditioners that are efficient, numerically stable, and high-fidelity. It requires no manual hyperparameter tuning. Contribution/Results: DyKAF significantly improves FIM approximation quality and optimization robustness. In large language model pretraining and fine-tuning, it consistently outperforms state-of-the-art optimizers in convergence speed, final task performance, and generalization ability, demonstrating both theoretical soundness and practical efficacy.

Technology Category

Application Category

📝 Abstract
Recently, optimizers that explicitly treat weights as matrices, rather than flattened vectors, have demonstrated their effectiveness. This perspective naturally leads to structured approximations of the Fisher matrix as preconditioners, where the matrix view induces a Kronecker-factorized form that enables memory-efficient representation. However, constructing such approximations both efficiently and accurately remains an open challenge, since obtaining the optimal factorization is resource-intensive and practical methods therefore rely on heuristic design choices. In this work, we introduce a novel approach that leverages projector-splitting integrators to construct effective preconditioners. Our optimizer, DyKAF (Dynamical Kronecker Approximation of the Fisher Matrix), consistently improves the Fisher matrix approximation quality. Experiments on large language model pre-training and fine-tuning demonstrate that DyKAF outperforms existing optimizers across a range of evaluation metrics.
Problem

Research questions and friction points this paper is trying to address.

Improving Kronecker factorization of Fisher matrix for gradient preconditioning
Enabling efficient and accurate Fisher matrix approximations for optimization
Developing dynamical Kronecker approximation method for neural network training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages projector-splitting integrators for preconditioners
Dynamically improves Fisher matrix approximation quality
Outperforms existing optimizers in language model tasks