🤖 AI Summary
This work addresses a key limitation in existing curiosity-driven exploration methods, whose learning progress signals—such as prediction errors—often fail to distinguish meaningful patterns from random noise. To overcome this, the paper introduces a Gradient-Momentum Coupling (GMC) signal that quantifies each sample’s contribution to sustained learning by computing the normalized absolute product of its gradient and historical momentum across all parameters. By leveraging the intrinsic noise-filtering properties of momentum-based optimizers, GMC naturally suppresses spurious signals and generates an automatic curriculum grounded in actual learning dynamics rather than task difficulty. Empirical results demonstrate that GMC exhibits superior robustness to observational noise in environments like MiniGrid and significantly outperforms conventional prediction-error-based metrics in curriculum generation for reinforcement learning.
📝 Abstract
Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momentum Coupling (GMC), a signal derived from optimization dynamics that quantifies how useful each sample's gradient is for ongoing learning by measuring its per-parameter normalized absolute product with the momentum from previous gradients. By leveraging momentum's natural filtering of noise and oscillations, GMC identifies samples that contribute to ongoing parameter updates. Controlled experiments demonstrate noise robustness and emergent curriculum learning, with the signal prioritizing tasks by learning speed rather than difficulty. Experiments on MiniGrid suggest that replacing prediction error with GMC within existing curiosity-driven architectures can improve robustness to observation noise.