High-performance training and inference for deep equivariant interatomic potentials

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

To address the poor training scalability and high inference latency of deep equivariant interatomic potential models in large-scale molecular dynamics simulations and high-throughput screening, this work refactors the NequIP framework. It introduces the first end-to-end application of the PyTorch 2.0 Ahead-of-Time Inductor compiler to equivariant potential inference and designs a custom CUDA kernel for tensor products tailored to the Allegro architecture. Furthermore, it integrates Distributed Data Parallel (DDP) with full-pipeline compilation optimizations to enable multi-node distributed training and efficient deployment. Evaluated on the SPICE-2 dataset, the approach achieves up to an 18× speedup in molecular dynamics inference latency. This breakthrough simultaneously alleviates the dual bottlenecks of training scalability and real-time inference performance inherent to equivariant potentials, establishing a novel, scalable, and low-latency computational paradigm for large-scale atomic modeling.

Technology Category

Application Category

📝 Abstract

Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.

Problem

Research questions and friction points this paper is trying to address.

Enhancing deep equivariant interatomic potentials for large-scale atomistic modeling

Improving multi-node parallelism and computational performance in NequIP

Accelerating molecular dynamics via optimized training and inference infrastructure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-node parallelism for distributed training

PyTorch 2.0 compiler for training acceleration

Custom kernel for tensor product optimization

🔎 Similar Papers

No similar papers found.