Hierarchical Bayesian Flow Networks for Molecular Graph Generation

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Molecular graph generation is inherently a discrete classification task, yet mainstream continuous diffusion models formulate it as a regression problem, relying on post-hoc rounding—causing inconsistency between training objectives and inference, leading to overfitting, reduced diversity, and limited generalization. To address this, we propose GraphBFN, the first hierarchical Bayesian flow network that operates directly in the distribution-parameter space. GraphBFN explicitly aligns training loss with discrete sampling via cumulative distribution function (CDF)-based rounding, ensuring training-inference consistency. It enables progressive generation—from coarse topological structure to fine-grained atomic and bond attributes. On QM9 and ZINC250k, GraphBFN achieves state-of-the-art performance: high generation speed, chemical validity >98%, and significantly improved diversity. Crucially, it mitigates the continuous-discrete objective mismatch inherent in prior diffusion-based approaches.

Technology Category

Application Category

📝 Abstract

Molecular graph generation is essentially a classification generation problem, aimed at predicting categories of atoms and bonds. Currently, prevailing paradigms such as continuous diffusion models are trained to predict continuous numerical values, treating the training process as a regression task. However, the final generation necessitates a rounding step to convert these predictions back into discrete classification categories, which is intrinsically a classification operation. Given that the rounding operation is not incorporated during training, there exists a significant discrepancy between the model's training objective and its inference procedure. As a consequence, an excessive emphasis on point-wise precision can lead to overfitting and inefficient learning. This occurs because considerable efforts are devoted to capturing intra-bin variations that are ultimately irrelevant to the discrete nature of the task at hand. Such a flaw results in diminished molecular diversity and constrains the model's generalization capabilities. To address this fundamental limitation, we propose GraphBFN, a novel hierarchical coarse-to-fine framework based on Bayesian Flow Networks that operates on the parameters of distributions. By innovatively introducing Cumulative Distribution Function, GraphBFN is capable of calculating the probability of selecting the correct category, thereby unifying the training objective with the sampling rounding operation. We demonstrate that our method achieves superior performance and faster generation, setting new state-of-the-art results on the QM9 and ZINC250k molecular graph generation benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Addresses training-inference mismatch in molecular graph generation

Reduces overfitting from regression-based discrete category prediction

Improves molecular diversity and generalization through distribution parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Bayesian Flow Networks for molecular graphs

Cumulative Distribution Function unifies training and sampling

Coarse-to-fine framework operates on distribution parameters

🔎 Similar Papers

No similar papers found.

Authors to Follow