🤖 AI Summary
Orthogonal gradient descent (OGD) in continual learning neglects the geometric structure of the parameter space, leading to poor convergence and catastrophic forgetting. To address this, we propose Orthogonal Natural Gradient Descent (ONGD), the first method to couple natural gradients with orthogonal projection, enabling geometry-aware gradient correction along task-irrelevant directions on the Riemannian manifold. ONGD employs EKFAC to efficiently approximate the inverse Fisher information matrix, yielding a scalable natural orthogonal projection. We theoretically prove that its update direction simultaneously ensures task decoupling and information-geometric optimality. Empirically, on Permuted and Rotated MNIST benchmarks, ONGD significantly outperforms existing orthogonal and natural gradient methods—substantially mitigating forgetting while improving both convergence speed and stability across multiple tasks.
📝 Abstract
Orthogonal gradient descent has emerged as a powerful method for continual learning tasks. However, its Euclidean projections overlook the underlying information-geometric structure of the space of distributions parametrized by neural networks, which can lead to suboptimal convergence in learning tasks. To counteract this, we combine it with the idea of the natural gradient and present ONG (Orthogonal Natural Gradient Descent). ONG preconditions each new task gradient with an efficient EKFAC approximation of the inverse Fisher information matrix, yielding updates that follow the steepest descent direction under a Riemannian metric. To preserve performance on previously learned tasks, ONG projects these natural gradients onto the orthogonal complement of prior task gradients. We provide a theoretical justification for this procedure, introduce the ONG algorithm, and benchmark its performance on the Permuted and Rotated MNIST datasets. All code for our experiments/reproducibility can be found at https://github.com/yajatyadav/orthogonal-natural-gradient.