🤖 AI Summary
This study addresses three key challenges hindering the application of general-purpose machine learning interatomic potentials (MLIPs) in biomolecular simulation: inadequate modeling of long-range interactions, imbalanced training data distributions, and insufficient evaluation frameworks. We systematically assess the suitability of equivariant message-passing neural networks (EGNNs) for protein modeling. Using the SPICE-v2 dataset, we construct a multi-scale EGNN model that explicitly incorporates dispersion and electrostatic long-range contributions, and design targeted conformational sampling strategies alongside error-decomposition-based evaluation metrics. Our results demonstrate that: (1) explicit electrostatic terms significantly enhance conformational diversity in Trp-cage simulations; (2) predictive performance depends critically on training data composition—not merely model size; and (3) current benchmark protocols inadequately capture biologically relevant properties, including conformational distributions and thermodynamic stability. This work provides both methodological advances—such as physically informed long-range coupling and refined evaluation—and empirical evidence to support reliable MLIP deployment in protein simulations.
📝 Abstract
Universal machine-learned potentials promise transferable accuracy across compositional and vibrational degrees of freedom, yet their application to biomolecular simulations remains underexplored. This work systematically evaluates equivariant message-passing architectures trained on the SPICE-v2 dataset with and without explicit long-range dispersion and electrostatics. We assess the impact of model size, training data composition, and electrostatic treatment across in- and out-of-distribution benchmark datasets, as well as molecular simulations of bulk liquid water, aqueous NaCl solutions, and biomolecules, including alanine tripeptide, the mini-protein Trp-cage, and Crambin. While larger models improve accuracy on benchmark datasets, this trend does not consistently extend to properties obtained from simulations. Predicted properties also depend on the composition of the training dataset. Long-range electrostatics show no systematic impact across systems. However, for Trp-cage, their inclusion yields increased conformational variability. Our results suggest that imbalanced datasets and immature evaluation practices currently challenge the applicability of universal machine-learned potentials to biomolecular simulations.