🤖 AI Summary
Existing 3D molecular generation methods suffer from slow sampling, low chemical validity, or geometric distortions. This paper introduces the first unconditional E(3)-equivariant flow matching framework for joint generation of molecular graphs (including atom/bond types and formal charges) and precise 3D conformations. Key contributions include: (1) Semla, a scalable E(3)-equivariant message-passing architecture; (2) latent-space attention to enhance structural-geometric co-modeling; and (3) novel benchmark metrics exposing biases in prevailing evaluation protocols. On standard datasets such as GEOM-QM9, our method achieves state-of-the-art performance: it requires only 20 sampling steps—yielding a 100× speedup over the best prior method—while attaining >99.5% chemical validity, significantly reduced bond-angle and dihedral-angle errors, and improved conformational diversity and FCD scores.
📝 Abstract
Methods for jointly generating molecular graphs along with their 3D conformations have gained prominence recently due to their potential impact on structure-based drug design. Current approaches, however, often suffer from very slow sampling times or generate molecules with poor chemical validity. Addressing these limitations, we propose Semla, a scalable E(3)-equivariant message passing architecture. We further introduce an unconditional 3D molecular generation model, SemlaFlow, which is trained using equivariant flow matching to generate a joint distribution over atom types, coordinates, bond types and formal charges. Our model produces state-of-the-art results on benchmark datasets with as few as 20 sampling steps, corresponding to a two order-of-magnitude speedup compared to state-of-the-art. Furthermore, we highlight limitations of current evaluation methods for 3D generation and propose new benchmark metrics for unconditional molecular generators. Finally, using these new metrics, we compare our model's ability to generate high quality samples against current approaches and further demonstrate SemlaFlow's strong performance.