🤖 AI Summary
Adaptive balancing of exploration and exploitation remains challenging in learning long-horizon, complex action sequences due to difficulty in determining optimal timing for exploration–exploitation trade-offs. Method: This paper proposes a cognitive-uncertainty-driven adaptive exploration framework that online quantifies dual uncertainty—over both the environment model and the policy—to dynamically modulate exploration intensity and switching timing. It unifies diverse uncertainty sources (e.g., model prediction variance, policy confidence) within a modular architecture supporting plug-and-play integration of heterogeneous uncertainty estimators, and incorporates intrinsic motivation to enable uncertainty-guided policy optimization. Results: Evaluated on multiple MuJoCo continuous-control benchmarks, the framework significantly outperforms baseline methods—including entropy regularization and Random Network Distillation—demonstrating superior effectiveness, robustness, and cross-task generalization capability.
📝 Abstract
Adaptive exploration methods propose ways to learn complex policies via alternating between exploration and exploitation. An important question for such methods is to determine the appropriate moment to switch between exploration and exploitation and vice versa. This is critical in domains that require the learning of long and complex sequences of actions. In this work, we present a generic adaptive exploration framework that employs uncertainty to address this important issue in a principled manner. Our framework includes previous adaptive exploration approaches as special cases. Moreover, we can incorporate in our framework any uncertainty-measuring mechanism of choice, for instance mechanisms used in intrinsic motivation or epistemic uncertainty-based exploration methods. We experimentally demonstrate that our framework gives rise to adaptive exploration strategies that outperform standard ones across several MuJoCo environments.