🤖 AI Summary
Existing molecular generation models face performance and efficiency bottlenecks in high-quality, property-controlled synthesis. This paper proposes GeoRCG, a two-stage generative framework: first learning equivariant geometric representations to encode structural semantics, then conditioning diffusion-based generation on these representations. Its core innovation is a theoretically grounded geometric semantic conditioning mechanism—replacing conventional scalar property conditioning—to enhance target-directedness and generalization. By integrating EDM and SemlaFlow architectures, GeoRCG achieves high-fidelity generation in only 100 diffusion steps. On QM9 and GEOM-DRUG, unconditional generation shows significantly improved diversity and validity; conditional generation outperforms state-of-the-art methods by 31% on average while reducing diffusion steps by 90% without sacrificing quality.
📝 Abstract
Recent advancements in molecular generative models have demonstrated substantial potential in accelerating scientific discovery, particularly in drug design. However, these models often face challenges in generating high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to enhance the performance of molecular generative models by integrating geometric representation conditions with provable theoretical guarantees. We decompose the molecule generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared to directly generating a molecule, the relatively easy-to-generate representation in the first stage guides the second-stage generation to reach a high-quality molecule in a more goal-oriented and much faster way. Leveraging EDM and SemlaFlow as the base generators, we observe significant quality improvements in unconditional molecule generation tasks on the widely-used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 31% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations over conditioning on individual property values as in previous approaches. Furthermore, we show that, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while largely preserving the generation quality achieved with 1,000 steps, thereby significantly accelerating the generation process.