Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses mode collapse in sequence generation with GFlowNets—manifesting as prefix collapse and length bias—stemming from insufficient credit assignment to early prefixes and replay-induced distributional bias. To mitigate these issues, the authors propose the RapTB objective, which anchors sub-trajectories at the root node and employs an absorbing-suffix backup mechanism to densely propagate terminal rewards back to intermediate prefixes, thereby delivering strong prefix-level supervision signals. Additionally, they introduce SubM, a replay strategy grounded in submodular functions, which enhances sample diversity while preserving high-reward trajectories, thus alleviating distributional shift. Evaluated on SMILES-based molecular generation, the proposed approach significantly improves optimization performance, generation diversity, and validity.

Technology Category

Application Category

📝 Abstract
Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two factors: (i) weak credit assignment to early prefixes, and (ii) biased replay that induces a shifted, non-representative training flow distribution. We propose Rooted absorbed prefix Trajectory Balance RapTB, an objective that anchors subtrajectory supervision at the root and propagates terminal rewards to intermediate prefixes via absorbed suffix-based backups, providing dense prefix-level learning signals. To mitigate replay-induced distribution shift, we further introduce SubM, a submodular replay refresh strategy that promotes both high reward and diversity. Empirically, on tasks such as molecule generation with LLM using SMILES strings, RapTB combined with SubM consistently improves optimization performance and molecular diversity while preserving high validity.
Problem

Research questions and friction points this paper is trying to address.

mode collapse
prefix collapse
length bias
GFlowNets
reward-proportional posterior
Innovation

Methods, ideas, or system contributions that make the work stand out.

GFlowNet
Trajectory Balance
submodular replay
prefix collapse
credit assignment
🔎 Similar Papers
No similar papers found.