🤖 AI Summary
This work addresses the severe scalability bottlenecks in mainstream equivariant graph neural networks when modeling 3D atomic systems, which arise from explicit geometric feature construction or dense tensor products. To overcome this, we propose an efficient equivariant attention architecture that replaces dense tensor operations with sparse reindexing via Equivariant Axis-Aligned Sparsification (EAAS), thereby avoiding explicit edge tensor construction. The method further enhances computational efficiency through Wigner-6j convolutions, an SO(3)→SO(2) basis transformation, and a node-centric normalization scheme. Additionally, we design hardware-aware Triton fused kernels and SRAM optimization strategies to significantly accelerate inference. Evaluated on the SPICE and OMol25 datasets, our model achieves comparable predictive accuracy while delivering up to a 20× improvement in TFLOPS-based inference speed.
📝 Abstract
Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems. However, mainstream architectures face critical scalability bottlenecks due to the explicit construction of geometric features or dense tensor products on \textit{every} edge. To overcome this, we introduce \textbf{E2Former-V2}, a scalable architecture that integrates algebraic sparsity with hardware-aware execution. We first propose \textbf{E}quivariant \textbf{A}xis-\textbf{A}ligned \textbf{S}parsification (EAAS). EAAS builds on Wigner-$6j$ convolution by exploiting an $\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$ change of basis to transform computationally expensive dense tensor contractions into efficient, sparse parity re-indexing operations. Building on this representation, we introduce \textbf{On-the-Fly Equivariant Attention}, a fully node-centric mechanism implemented via a custom fused Triton kernel. By eliminating materialized edge tensors and maximizing SRAM utilization, our kernel achieves a \textbf{20$\times$ improvement in TFLOPS} compared to standard implementations. Extensive experiments on the SPICE and OMol25 datasets demonstrate that E2Former-V2 maintains comparable predictive performance while notably accelerating inference. This work demonstrates that large equivariant transformers can be trained efficiently using widely accessible GPU platforms. The code is avalible at https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2.