KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Soft Actor-Critic (SAC) suffers from inefficient exploration and sample redundancy in sparse-reward continuous control tasks. To address this, we propose KEA (Keeping Exploration Alive), a dual-agent cooperative exploration framework. KEA introduces a co-behavioral agent and a state-dependent dynamic policy switching mechanism that harmonizes SAC’s intrinsic stochastic policy with intrinsic-curiosity-driven novelty exploration: in high-novelty regions, it preserves policy stochasticity to prevent exploration collapse; in low-novelty regions, it converges toward the optimal policy. This design achieves, for the first time, efficient compatibility between SAC and novelty-based exploration. Evaluated on the DeepMind Control Suite sparse-reward benchmark, KEA significantly accelerates learning convergence and enhances robustness, outperforming existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC's stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA introduces an additional co-behavior agent that works alongside SAC and a switching mechanism to facilitate proactive coordination between exploration strategies from novelty-based exploration and stochastic policy. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task and then evaluate KEA on sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-based exploration baselines, our experiments show that KEA significantly improves learning efficiency and robustness in sparse reward setups.

Problem

Research questions and friction points this paper is trying to address.

Balancing exploration strategies in sparse reward settings

Managing interaction between novelty-based exploration and SAC

Reducing inefficient exploration and redundant sample collection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces co-behavior agent for SAC coordination

Switching mechanism balances exploration strategies

Maintains stochasticity in high-novelty regions

🔎 Similar Papers

No similar papers found.