🤖 AI Summary
This work investigates the capability evolution mechanisms of large language models (LLMs) during pretraining and fine-tuning from a loss landscape perspective, specifically addressing whether fine-tuning degrades pretrained foundational capabilities.
Method: We propose a dual-perspective loss landscape analysis framework—characterizing both best-case and worst-case capability basins—and integrate direction-sensitive modeling, high-dimensional optimization theory, and overparameterization analysis.
Contribution/Results: We theoretically prove, for the first time, that the size of the best-case basin bounds both the extent of the worst-case basin and input robustness. We reveal that overparameterization significantly expands capability basins—by up to 5×—and establish a geometric criterion: remaining within any capability basin guarantees preservation of the corresponding capability. Experiments quantitatively validate the correlation between basin expandability and robustness boundaries, providing an interpretable geometric foundation for safe and controllable fine-tuning.
📝 Abstract
Recent studies have revealed that the loss landscape of large language models resembles a basin, within which the models perform nearly identically, and outside of which they lose all their capabilities. In this work, we conduct further studies on the loss landscape of large language models. We discover that pre-training creates a"basic capability"basin, and subsequent fine-tuning creates"specific capability"basins (e.g., math, safety, coding) within the basic capability basin. We further investigate two types of loss landscapes: the most-case landscape (i.e., the landscape along most directions) and the worst-case landscape (i.e., the landscape along the worst direction). We argue that as long as benign fine-tuning remains within the most-case basin, it will not compromise previous capabilities. Similarly, any fine-tuning (including the adversarial one) that stays within the worst-case basin would not compromise previous capabilities. Finally, we theoretically demonstrate that the size of the most-case basin can bound the size of the worst-case basin and the robustness with respect to input perturbations. We also show that, due to the over-parameterization property of current large language models, one can easily enlarge the basins by five times.