🤖 AI Summary
Existing E(3)-equivariant diffusion models for 3D molecular conformation generation are prone to biases from low-accuracy training data and struggle to approximate the thermodynamic equilibrium distribution governed by high-fidelity Hamiltonians. This work proposes Elign, a framework that uniquely integrates physics-based guidance entirely into the training phase. By leveraging a pretrained machine learning force field to replace costly quantum chemical computations, Elign introduces the FED-GRPO algorithm, which enhances reinforcement learning reward signals through force–energy decoupling and group-normalized optimization. The approach achieves significantly lower DFT-computed energies and forces in generated conformations—improving structural stability and physical consistency—while maintaining inference speed comparable to unguided sampling.
📝 Abstract
Generative models for 3D molecular conformations must respect Euclidean symmetries and concentrate probability mass on thermodynamically favorable, mechanically stable structures. However, E(3)-equivariant diffusion models often reproduce biases from semi-empirical training data rather than capturing the equilibrium distribution of a high-fidelity Hamiltonian. While physics-based guidance can correct this, it faces two computational bottlenecks: expensive quantum-chemical evaluations (e.g., DFT) and the need to repeat such queries at every sampling step. We present Elign, a post-training framework that amortizes both costs. First, we replace expensive DFT evaluations with a faster, pretrained foundational machine-learning force field (MLFF) to provide physical signals. Second, we eliminate repeated run-time queries by shifting physical steering to the training phase. To achieve the second amortization, we formulate reverse diffusion as a reinforcement learning problem and introduce Force--Energy Disentangled Group Relative Policy Optimization (FED-GRPO) to fine-tune the denoising policy. FED-GRPO includes a potential-based energy reward and a force-based stability reward, which are optimized and group-normalized independently. Experiments show that Elign generates conformations with lower gold-standard DFT energies and forces, while improving stability. Crucially, inference remains as fast as unguided sampling, since no energy evaluations are required during generation.