🤖 AI Summary
To address the scarcity of high-quality labeled data and the insufficient structural stability and accuracy in molecular generation, this paper proposes a quantum-guided diffusion generative framework that requires neither property predictors nor conditional models. The method integrates non-differentiable gradients—derived from first-principles quantum chemical calculations (e.g., DFT-based force fields)—as oracle signals directly into the sampling process of an unconditional diffusion model. This enables implicit, multi-objective conditional generation (e.g., energy minimization and geometric validity) without explicit supervision. By bypassing reliance on large-scale annotated datasets or differentiable surrogate models, the approach significantly reduces atomic forces in generated molecules—thereby enhancing structural plausibility—and demonstrates superior generalization and robustness across diverse molecular optimization tasks.
📝 Abstract
Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. These models can be guided in two ways: (i) explicitly, through additional features representing the condition, or (ii) implicitly, using a property predictor. However, training property predictors or conditional diffusion models requires an abundance of labeled data and is inherently challenging in real-world applications. We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry. We show that this results in more precise conditional generation of novel and stable molecular structures. Our experiments demonstrate that our method: (1) significantly reduces atomic forces, enhancing the validity of generated molecules when used for stability optimization; (2) is compatible with both explicit and implicit guidance in diffusion models, enabling joint optimization of molecular properties and stability; and (3) generalizes effectively to molecular optimization tasks beyond stability optimization.