🤖 AI Summary
This work addresses the inefficiency of Generative Flow Networks (GFlowNets) in high-dimensional state spaces, where excessive exploration of low-reward regions hinders convergence. To mitigate this, the study introduces, for the first time, a combinatorial multi-armed bandit (CMAB) mechanism into GFlowNets, dynamically identifying and focusing on multiple high-scoring, compact subspaces. Coupled with an action-pruning strategy, this approach effectively suppresses unproductive exploration while preserving generation diversity. The proposed method significantly enhances the discovery efficiency of high-value candidate solutions. Experimental results across multiple tasks demonstrate that the generated samples consistently achieve substantially higher rewards than those produced by existing methods, confirming the dual advantages of improved sample quality and exploration efficiency.
📝 Abstract
As a probabilistic sampling framework, Generative Flow Networks (GFlowNets) show strong potential for constructing complex combinatorial objects through the sequential composition of elementary components. However, existing GFlowNets often suffer from excessive exploration over vast state spaces, leading to over-sampling of low-reward regions and convergence to suboptimal distributions. Effectively biasing GFlowNets toward high-reward solutions remains a non-trivial challenge. In this paper, we propose CMAB-GFN, which integrates a combinatorial multi-armed bandit (CMAB) framework with GFlowNet policies. The CMAB component prunes low-quality actions, yielding compact high-scoring subspaces for exploration. Restricting GFNs to these compact high-scoring subspaces accelerates the discovery of high-value candidates, while the exploration of different subspaces ensures that diversity is not sacrificed. Experimental results on multiple tasks demonstrate that CMAB-GFN generates higher-reward candidates than existing approaches.