🤖 AI Summary
This work addresses the challenge of accurately modeling long-range interactions in large-scale systems with machine learning interatomic potentials (MLIPs), which often require explicit physical terms for compensation. The authors propose AllScAIP, a graph neural network architecture that eschews strong physics-inspired inductive biases and instead employs a fully connected node attention mechanism to enable end-to-end, data-driven modeling of long-range effects while ensuring energy conservation and scalability. Trained on datasets comprising over 100 million samples, AllScAIP achieves state-of-the-art performance in energy and force prediction on the OMol25 benchmark, demonstrates strong results on OMat24 and OC20, and enables stable long-timescale molecular dynamics simulations that accurately reproduce experimental densities and heats of vaporization.
📝 Abstract
Machine-learning interatomic potentials (MLIPs) have advanced rapidly, with many top models relying on strong physics-based inductive biases. However, as models scale to larger systems like biomolecules and electrolytes, they struggle to accurately capture long-range (LR) interactions, leading current approaches to rely on explicit physics-based terms or components. In this work, we propose AllScAIP, a straightforward, attention-based, and energy-conserving MLIP model that scales to O(100 million) training samples. It addresses the long-range challenge using an all-to-all node attention component that is data-driven. Extensive ablations reveal that in low-data/small-model regimes, inductive biases improve sample efficiency. However, as data and model size scale, these benefits diminish or even reverse, while all-to-all attention remains critical for capturing LR interactions. Our model achieves state-of-the-art energy/force accuracy on molecular systems, as well as a number of physics-based evaluations (OMol25), while being competitive on materials (OMat24) and catalysts (OC20). Furthermore, it enables stable, long-timescale MD simulations that accurately recover experimental observables, including density and heat of vaporization predictions.