🤖 AI Summary
This work addresses the inefficiency of conventional Generative Flow Networks (GFlowNets) in settings where the reward function exhibits unknown submodular structure, leading to excessive reliance on costly reward queries. To overcome this limitation, the authors introduce submodularity into the GFlowNet training framework for the first time. By deriving upper bounds on the rewards of unobserved combinatorial objects and integrating an optimism-in-the-face-of-uncertainty exploration strategy, the method guides sampling toward high-reward regions more efficiently. Leveraging the structural properties of submodular functions, the approach generates abundant informative training data while drastically reducing the need for ground-truth reward evaluations. Empirical results demonstrate that, under identical query budgets, the proposed method produces orders of magnitude more high-quality samples than baselines on both synthetic and real-world submodular tasks, significantly improving distributional matching accuracy and the quality of candidate solutions.
📝 Abstract
Generative Flow Networks (GFlowNets; GFNs) are a class of generative models that learn to sample compositional objects proportionally to their a priori unknown value, their reward. We focus on the case where the reward has a specified, actionable structure, namely that it is submodular. We show submodularity can be harnessed to retrieve upper bounds on the reward of compositional objects that have not yet been observed. We provide in-depth analyses of the probability of such bounds occurring, as well as how many unobserved compositional objects can be covered by a bound. Following the Optimism in the Face of Uncertainty principle, we then introduce SUBo-GFN, which uses the submodular upper bounds to train a GFN. We show that SUBo-GFN generates orders of magnitude more training data than classical GFNs for the same number of queries to the reward function. We demonstrate the effectiveness of SUBo-GFN in terms of distribution matching and high-quality candidate generation on synthetic and real-world submodular tasks.