🤖 AI Summary
This work addresses the trade-off between physical realism and computational efficiency in molecular conformation generation. Traditional approaches rely on expensive ab initio calculations, while diffusion models are constrained by explicit timestep conditioning. The authors propose Generative Pseudo Force Fields (GPFF), which construct a quadratic pseudo potential energy surface around reference equilibrium geometries, enabling online generation of non-equilibrium training data without ab initio evaluations for perturbed conformations. By integrating the diffusion process with machine-learned force fields, GPFF implicitly encodes noise levels, thereby eliminating explicit timestep dependencies. The framework supports both standard and adaptive sampling and naturally incorporates structural priors and geometric constraints. On QM9, it achieves over 50% validity with only six neural function evaluations and reaches 100% validity at 256 evaluations, demonstrating high-precision real-time generation in drug molecule editing applications.
📝 Abstract
Generating stable molecular conformations typically forces a tradeoff between the physical realism of energy-based relaxation and the sampling efficiency of data-driven generative models. While machine learning force fields (MLFFs) can sample stable conformations by relaxing molecular geometries according to physical forces, they require costly ab-initio training data. Conversely, diffusion models (DMs) learn from equilibrium data alone but are dependent on noise schedules and time-step conditioning. In this work, we propose generative pseudo-force fields (GPFFs) to bridge these paradigms by training an MLFF on a quadratic pseudo-potential energy surface relative to reference equilibrium structures. Because no ab-initio calculations are required for the perturbed geometries, non-equilibrium training data can be generated on the fly by perturbing the equilibria with Gaussian noise. We show that GPFFs constitute a time-step-agnostic variant of variance exploding DMs: the score comes from the predicted pseudo-forces but because force magnitudes implicitly encode the noise level, no time-step conditioning is needed. Our GPFF can hence be used as a drop-in replacement in standard diffusion sampling (ancestral, Heun) but also facilitates more efficient, adaptive variants and an MLFF inspired direct denoising scheme. Our proposed sampling algorithms support arbitrary structural priors and geometric constraints. On QM9, GPFF has 100 % validity at 256 neural function evaluations (NFE) and over 50 % at just 6 NFE, outperforming diffusion baselines across all samplers. Combined with custom priors, we showcase the fast and accurate generation process of our method in a molecular editor for a drug design setting, where a molecule is generated in real time.