๐ค AI Summary
This work addresses three core challenges in molecular 3D conformation generation: joint modeling of discrete graph structures and continuous geometry, preservation of Euclidean symmetry (rotation/translation invariance), and multi-scale conditional control. We propose DiTMC, a modular diffusion Transformer architecture. Its key contributions are: (1) two graph-conditioning injection mechanisms enabling efficient guidance of the diffusion process by molecular graphs; (2) hybrid attention combining non-equivariant and SO(3)-equivariant modules to balance modeling fidelity and computational efficiency; and (3) an end-to-end differentiable framework inherently satisfying rotation and translation invariance. Evaluated on GEOM-QM9, DRUGS, and GEOM-XL benchmarks, DiTMC achieves state-of-the-art performance in both conformational accuracy (lower RMSD) and physical validity (higher valence and steric validity), significantly improving sampling quality and generalization across diverse molecular scaffolds.
๐ Abstract
Diffusion Transformers (DiTs) have demonstrated strong performance in generative modeling, particularly in image synthesis, making them a compelling choice for molecular conformer generation. However, applying DiTs to molecules introduces novel challenges, such as integrating discrete molecular graph information with continuous 3D geometry, handling Euclidean symmetries, and designing conditioning mechanisms that generalize across molecules of varying sizes and structures. We propose DiTMC, a framework that adapts DiTs to address these challenges through a modular architecture that separates the processing of 3D coordinates from conditioning on atomic connectivity. To this end, we introduce two complementary graph-based conditioning strategies that integrate seamlessly with the DiT architecture. These are combined with different attention mechanisms, including both standard non-equivariant and SO(3)-equivariant formulations, enabling flexible control over the trade-off between between accuracy and computational efficiency. Experiments on standard conformer generation benchmarks (GEOM-QM9, -DRUGS, -XL) demonstrate that DiTMC achieves state-of-the-art precision and physical validity. Our results highlight how architectural choices and symmetry priors affect sample quality and efficiency, suggesting promising directions for large-scale generative modeling of molecular structures. Code available at https://github.com/ML4MolSim/dit_mc.