π€ AI Summary
To address the high computational cost and slow convergence in online fine-tuning of large language models (LLMs), this paper proposes the Low-rank Kalman Optimizer (LoKO), the first method to integrate Kalman filtering into parameter-efficient fine-tuning (PEFT). LoKO performs linear-complexity online state estimation atop LoRAβs low-rank parameter structure, leveraging a diagonal covariance approximation and a robust noise estimation mechanism to overcome the computational and numerical stability bottlenecks of conventional Kalman filtering in large-scale settings. Experiments demonstrate that LoKO converges significantly faster than AdamW+LoRA on image classification and language understanding tasks, while achieving superior generalization across multiple vision and language foundation models. By enabling efficient, stable, and gradient-free online adaptation, LoKO establishes a novel paradigm for scalable and reliable LLM fine-tuning.
π Abstract
Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), address this challenge by adapting only a reduced number of parameters to specific tasks with gradient-based optimizers. In this paper, we cast PEFT as an optimal filtering/state estimation problem and present Low-Rank Kalman Optimizer (LoKO) to estimate the optimal trainable parameters in an online manner. We leverage the low-rank decomposition in LoRA to significantly reduce matrix sizes in Kalman iterations and further capitalize on a diagonal approximation of the covariance matrix to effectively decrease computational complexity from quadratic to linear in the number of trainable parameters. Moreover, we discovered that the initialization of the covariance matrix within the Kalman algorithm and the accurate estimation of the observation noise covariance are the keys in this formulation, and we propose robust approaches that work well across a vast range of well-established computer vision and language models. Our results show that LoKO converges with fewer iterations and yields better performance models compared to commonly used optimizers with LoRA in both image classifications and language tasks. Our study opens up the possibility of leveraging the Kalman filter as an effective optimizer for the online fine-tuning of large models.