🤖 AI Summary
Fractional Gradient Descent (FGD) suffers from unstable convergence and difficulty in adaptively scheduling hyperparameters—such as the fractional order and step size—in non-convex optimization, particularly lacking dynamic mechanisms for history-dependent hyperparameter adjustment. This work introduces meta-learning to Caputo-type Fractional Gradient Descent (CFGD) for the first time, proposing a differentiable meta-controller that dynamically optimizes its key hyperparameters during training. By synergistically integrating fractional calculus and meta-learning, the method enables history-aware, adaptive hyperparameter tuning. Experiments on multiple benchmark tasks demonstrate that the proposed approach significantly outperforms hand-tuned CFGD in both convergence speed and robustness; in certain scenarios, it matches the performance of fully black-box meta-optimizers. To our knowledge, this is the first differentiable, trainable, and practical meta-learning framework tailored for fractional-order optimization.
📝 Abstract
Fractional Gradient Descent (FGD) offers a novel and promising way to accelerate optimization by incorporating fractional calculus into machine learning. Although FGD has shown encouraging initial results across various optimization tasks, it faces significant challenges with convergence behavior and hyperparameter selection. Moreover, the impact of its hyperparameters is not fully understood, and scheduling them is particularly difficult in non-convex settings such as neural network training. To address these issues, we propose a novel approach called Learning to Optimize Caputo Fractional Gradient Descent (L2O-CFGD), which meta-learns how to dynamically tune the hyperparameters of Caputo FGD (CFGD). Our method's meta-learned schedule outperforms CFGD with static hyperparameters found through an extensive search and, in some tasks, achieves performance comparable to a fully black-box meta-learned optimizer. L2O-CFGD can thus serve as a powerful tool for researchers to identify high-performing hyperparameters and gain insights on how to leverage the history-dependence of the fractional differential in optimization.