🤖 AI Summary
Existing molecular generation methods face a trade-off between strict E(3)-equivariance and model scalability: rigorously equivariant architectures incur high computational cost, while relaxing symmetry constraints compromises physical plausibility. This work proposes a reference-frame-based diffusion paradigm—the first to achieve deterministic E(3)-equivariant generation—fully decoupling symmetry handling from the backbone network. We introduce three complementary reference frames—global, local, and invariant—and integrate them with EdgeDiT, an edge-aware attention mechanism, alongside molecular alignment constraints to enhance both modeling fidelity and sampling efficiency. On the QM9 benchmark, our method achieves a test negative log-likelihood of −137.97, molecular stability of 90.51%, and sampling speed approximately twice that of EDM, consistently outperforming all existing equivariant baselines across key metrics.
📝 Abstract
Recent methods for molecular generation face a trade-off: they either enforce strict equivariance with costly architectures or relax it to gain scalability and flexibility. We propose a frame-based diffusion paradigm that achieves deterministic E(3)-equivariance while decoupling symmetry handling from the backbone. Building on this paradigm, we investigate three variants: Global Frame Diffusion (GFD), which assigns a shared molecular frame; Local Frame Diffusion (LFD), which constructs node-specific frames and benefits from additional alignment constraints; and Invariant Frame Diffusion (IFD), which relies on pre-canonicalized invariant representations. To enhance expressivity, we further utilize EdgeDiT, a Diffusion Transformer with edge-aware attention.
On the QM9 dataset, GFD with EdgeDiT achieves state-of-the-art performance, with a test NLL of -137.97 at standard scale and -141.85 at double scale, alongside atom stability of 98.98%, and molecular stability of 90.51%. These results surpass all equivariant baselines while maintaining high validity and uniqueness and nearly 2x faster sampling compared to EDM. Altogether, our study establishes frame-based diffusion as a scalable, flexible, and physically grounded paradigm for molecular generation, highlighting the critical role of global structure preservation.