🤖 AI Summary
This work addresses the training challenges of target-conditioned GFlowNets under sparse rewards, high-dimensional state spaces, and offline-only data regimes. We propose Retrospective Backward Synthesis (RBS), a novel method that employs a learnable backward policy to dynamically generate high-quality, diverse backward trajectories—thereby mitigating reward sparsity and expanding trajectory coverage. Our key contributions are threefold: (i) the first integration of a retrospection mechanism into GFlowNets, enabling bidirectional flow modeling of target-conditioned policies and re-synthesis of backward trajectories; (ii) support for joint multi-objective training; and (iii) incorporation of importance weighting to enhance training stability. Evaluated on multiple benchmark tasks, RBS achieves significant gains in sample efficiency and outperforms strong baselines in both generated sample diversity and target-matching accuracy.
📝 Abstract
Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a stochastic policy to sequentially generate compositional objects with probabilities proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward objects, in contrast to standard return maximization reinforcement learning approaches, which often converge to a single optimal solution. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. However, training a goal-conditioned GFlowNet poses critical challenges due to extremely sparse rewards, which is further exacerbated in large state spaces. In this work, we propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory based on the backward policy in GFlowNets to enrich training trajectories with enhanced quality and diversity, thereby efficiently solving the sparse reward problem. Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks.