🤖 AI Summary
To address the high memory consumption and computational complexity arising from forward-mode differentiation in training Neural Fractional Differential Equations (Neural FDEs), this work introduces, for the first time, adjoint-based backpropagation into the Neural FDE training framework. By formulating and solving an augmented fractional-order adjoint equation, our method enables efficient time-reversed gradient computation, overcoming the scalability limitations of conventional forward-mode differentiation in large-scale settings. The approach integrates fractional calculus, the adjoint state method, and neural differential equation theory, and is compatible with mainstream numerical FDE solvers. Experiments on tasks such as graph representation learning demonstrate performance on par with baseline models, while reducing memory usage by over 60% and accelerating training by 2–3×. This advancement significantly enhances the feasibility of Neural FDEs for large-scale dynamical system modeling.
📝 Abstract
Fractional-order differential equations (FDEs) enhance traditional differential equations by extending the order of differential operators from integers to real numbers, offering greater flexibility in modeling complex dynamical systems with nonlocal characteristics. Recent progress at the intersection of FDEs and deep learning has catalyzed a new wave of innovative models, demonstrating the potential to address challenges such as graph representation learning. However, training neural FDEs has primarily relied on direct differentiation through forward-pass operations in FDE numerical solvers, leading to increased memory usage and computational complexity, particularly in large-scale applications. To address these challenges, we propose a scalable adjoint backpropagation method for training neural FDEs by solving an augmented FDE backward in time, which substantially reduces memory requirements. This approach provides a practical neural FDE toolbox and holds considerable promise for diverse applications. We demonstrate the effectiveness of our method in several tasks, achieving performance comparable to baseline models while significantly reducing computational overhead.