🤖 AI Summary
This paper addresses the geometric inconsistency of gradient descent across arbitrary coordinate systems and trainable-curvature manifolds. We propose Covariant Gradient Descent (CGD), a framework grounded in covariant differential geometry that explicitly constructs a covariant force vector (first-order statistical moment) and a covariant metric tensor (second-order statistical moment), thereby achieving the first strictly covariant formulation of gradient descent. Theoretically, CGD is provably invariant under arbitrary coordinate transformations and manifold curvature changes, while retaining linear computational complexity. RMSProp and Adam emerge as special cases of CGD in Euclidean space under specific metric choices; moreover, CGD enables their geometric generalization and performance enhancement. Empirical results demonstrate that CGD significantly outperforms mainstream optimizers in both convergence stability and model generalization across diverse tasks.
📝 Abstract
We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces. The optimization dynamics is defined using a covariant force vector and a covariant metric tensor, both computed from the first and second statistical moments of the gradients. These moments are estimated through time-averaging with an exponential weight function, which preserves linear computational complexity. We show that commonly used optimization methods such as RMSProp and Adam correspond to special limits of the covariant gradient descent (CGD) and demonstrate how these methods can be further generalized and improved.