🤖 AI Summary
To address the poor training scalability and high inference latency of deep equivariant interatomic potential models in large-scale molecular dynamics simulations and high-throughput screening, this work refactors the NequIP framework. It introduces the first end-to-end application of the PyTorch 2.0 Ahead-of-Time Inductor compiler to equivariant potential inference and designs a custom CUDA kernel for tensor products tailored to the Allegro architecture. Furthermore, it integrates Distributed Data Parallel (DDP) with full-pipeline compilation optimizations to enable multi-node distributed training and efficient deployment. Evaluated on the SPICE-2 dataset, the approach achieves up to an 18× speedup in molecular dynamics inference latency. This breakthrough simultaneously alleviates the dual bottlenecks of training scalability and real-time inference performance inherent to equivariant potentials, establishing a novel, scalable, and low-latency computational paradigm for large-scale atomic modeling.
📝 Abstract
Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.