🤖 AI Summary
Structured approximations of the Fisher Information Matrix (FIM) in large-scale model training face a fundamental trade-off between computational efficiency and approximation accuracy.
Method: We propose Dynamic Kronecker Adaptive Factorization (DyKAF), the first method to incorporate projection-splitting dynamics into Kronecker-factorized FIM approximation. DyKAF performs dynamic factor updates and employs a projection-splitting integrator—operating directly in matrix space—to learn preconditioners that are efficient, numerically stable, and high-fidelity. It requires no manual hyperparameter tuning.
Contribution/Results: DyKAF significantly improves FIM approximation quality and optimization robustness. In large language model pretraining and fine-tuning, it consistently outperforms state-of-the-art optimizers in convergence speed, final task performance, and generalization ability, demonstrating both theoretical soundness and practical efficacy.
📝 Abstract
Recently, optimizers that explicitly treat weights as matrices, rather than flattened vectors, have demonstrated their effectiveness. This perspective naturally leads to structured approximations of the Fisher matrix as preconditioners, where the matrix view induces a Kronecker-factorized form that enables memory-efficient representation. However, constructing such approximations both efficiently and accurately remains an open challenge, since obtaining the optimal factorization is resource-intensive and practical methods therefore rely on heuristic design choices. In this work, we introduce a novel approach that leverages projector-splitting integrators to construct effective preconditioners. Our optimizer, DyKAF (Dynamical Kronecker Approximation of the Fisher Matrix), consistently improves the Fisher matrix approximation quality. Experiments on large language model pre-training and fine-tuning demonstrate that DyKAF outperforms existing optimizers across a range of evaluation metrics.