🤖 AI Summary
This work addresses the challenge of efficiently exploring diverse high-reward regions during training in Generative Flow Networks (GFlowNets). To this end, the authors propose Adaptive Complementary Exploration (ACE), a novel algorithm that employs a dual GFlowNet architecture: a primary network learns the target distribution, while an exploration network focuses on high-reward regions insufficiently covered by the primary network. The two networks are trained jointly through trajectory balance learning, guided by an adaptive reward mechanism that dynamically directs exploration toward underrepresented yet promising areas. ACE significantly improves the accuracy of approximating the target distribution and demonstrates superior performance in discovering diverse high-reward samples across multiple benchmark tasks, outperforming existing methods in both sample efficiency and diversity.
📝 Abstract
Generative Flow Networks (GFlowNets) are a flexible family of amortized samplers trained to generate discrete and compositional objects with probability proportional to a reward function. However, learning efficiency is constrained by the model's ability to rapidly explore diverse high-probability regions during training. To mitigate this issue, recent works have focused on incentivizing the exploration of unvisited and valuable states via curiosity-driven search and self-supervised random network distillation, which tend to waste samples on already well-approximated regions of the state space. In this context, we propose Adaptive Complementary Exploration (ACE), a principled algorithm for the effective exploration of novel and high-probability regions when learning GFlowNets. To achieve this, ACE introduces an exploration GFlowNet explicitly trained to search for high-reward states in regions underexplored by the canonical GFlowNet, which learns to sample from the target distribution. Through extensive experiments, we show that ACE significantly improves upon prior work in terms of approximation accuracy to the target distribution and discovery rate of diverse high-reward states.