🤖 AI Summary
This work addresses the instability commonly encountered in training Generative Flow Networks (GFlowNets), which often manifests as severe loss fluctuations and mode collapse, hindering accurate learning of the target distribution. The study establishes, for the first time, a reverse theoretical guarantee linking the trajectory balance loss to the total variation distance, thereby revealing the sensitivity of the objective function to distributional discrepancies. Building on this insight, the authors propose Stable GFlowNets, a novel training algorithm that leverages total variation distance analysis to theoretically reformulate the trajectory balance loss. This approach substantially enhances training stability, effectively mitigates loss oscillations, and achieves superior distribution matching performance across multiple benchmark tasks.
📝 Abstract
Generative Flow Networks (GFlowNets) learn to sample states proportional to an unnormalized reward. Despite their theoretical promise, practical training is often unstable, exhibiting severe loss spikes and mode collapse. To tackle this, we first assess the sensitivity of GFlowNet objectives, demonstrating that a small Total Variation (TV) distance between the learned and target distributions does not preclude unbounded training loss. Motivated by this mismatch, we establish converse guarantees by deriving loss-to-TV bounds that certify global fidelity from bounded trajectory balance losses. Lastly, we propose Stable GFlowNets, an algorithm that leverages our theoretical results to stabilize training, and empirically demonstrate improved training behavior and superior distributional fidelity.