🤖 AI Summary
Soft Actor-Critic (SAC) suffers from inefficient exploration and sample redundancy in sparse-reward continuous control tasks. To address this, we propose KEA (Keeping Exploration Alive), a dual-agent cooperative exploration framework. KEA introduces a co-behavioral agent and a state-dependent dynamic policy switching mechanism that harmonizes SAC’s intrinsic stochastic policy with intrinsic-curiosity-driven novelty exploration: in high-novelty regions, it preserves policy stochasticity to prevent exploration collapse; in low-novelty regions, it converges toward the optimal policy. This design achieves, for the first time, efficient compatibility between SAC and novelty-based exploration. Evaluated on the DeepMind Control Suite sparse-reward benchmark, KEA significantly accelerates learning convergence and enhances robustness, outperforming existing state-of-the-art methods.
📝 Abstract
Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC's stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA introduces an additional co-behavior agent that works alongside SAC and a switching mechanism to facilitate proactive coordination between exploration strategies from novelty-based exploration and stochastic policy. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task and then evaluate KEA on sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-based exploration baselines, our experiments show that KEA significantly improves learning efficiency and robustness in sparse reward setups.