$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing generative models lack a loss function for off-policy training that simultaneously ensures low variance, correct gradient direction, and a consistent global optimum. This work proposes a novel class of loss functions based on $f$-divergences, establishing—for the first time—a one-to-one correspondence between translation-invariant losses and $f$-divergences. This connection extends desirable properties of $f$-divergences, such as mode coverage, to off-policy settings. The proposed losses maintain identical global optima and correct gradient directions both inside and outside the data distribution, making them applicable to GFlowNets, variational inference, and large language model (LLM) fine-tuning. Experiments demonstrate that the new losses significantly improve model performance and training stability across synthetic tasks, molecular generation, and LLM alignment.

📝 Abstract

In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models. This loss has the property that when evaluated \emph{on-policy} its gradients correspond to those of the KL divergence, while \emph{off-policy} it remains a valid loss with the same global minimizer. In this work, we demonstrate that this construction can be extended to the whole family of $f$-divergences, leading to a family of losses whose on-policy gradients are that of the corresponding $f$-divergence, but retain the same global minimizer off-policy. Specifically, we show that the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and $f$-divergences. This equivalence allows us to design new surrogate loss functions for tuning a wide class of generative models that inherit the properties of the corresponding $f$-divergence, such as being more mode covering, whilst being applicable to off-policy data. We apply our losses on a range of tasks, including classic synthetic examples, SynFlowNets for molecule discovery, and asynchronous large language model (LLM) tuning, demonstrating that our models retain their predicted properties on- and off-policy in a wide class of generative models.

Problem

Research questions and friction points this paper is trying to address.

f-divergence

GFlowNets

generative models

off-policy learning

surrogate loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

f-divergence

GFlowNets

surrogate loss