Performance of universal machine-learned potentials with explicit long-range interactions in biomolecular simulations

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses three key challenges hindering the application of general-purpose machine learning interatomic potentials (MLIPs) in biomolecular simulation: inadequate modeling of long-range interactions, imbalanced training data distributions, and insufficient evaluation frameworks. We systematically assess the suitability of equivariant message-passing neural networks (EGNNs) for protein modeling. Using the SPICE-v2 dataset, we construct a multi-scale EGNN model that explicitly incorporates dispersion and electrostatic long-range contributions, and design targeted conformational sampling strategies alongside error-decomposition-based evaluation metrics. Our results demonstrate that: (1) explicit electrostatic terms significantly enhance conformational diversity in Trp-cage simulations; (2) predictive performance depends critically on training data composition—not merely model size; and (3) current benchmark protocols inadequately capture biologically relevant properties, including conformational distributions and thermodynamic stability. This work provides both methodological advances—such as physically informed long-range coupling and refined evaluation—and empirical evidence to support reliable MLIP deployment in protein simulations.

Technology Category

Application Category

📝 Abstract
Universal machine-learned potentials promise transferable accuracy across compositional and vibrational degrees of freedom, yet their application to biomolecular simulations remains underexplored. This work systematically evaluates equivariant message-passing architectures trained on the SPICE-v2 dataset with and without explicit long-range dispersion and electrostatics. We assess the impact of model size, training data composition, and electrostatic treatment across in- and out-of-distribution benchmark datasets, as well as molecular simulations of bulk liquid water, aqueous NaCl solutions, and biomolecules, including alanine tripeptide, the mini-protein Trp-cage, and Crambin. While larger models improve accuracy on benchmark datasets, this trend does not consistently extend to properties obtained from simulations. Predicted properties also depend on the composition of the training dataset. Long-range electrostatics show no systematic impact across systems. However, for Trp-cage, their inclusion yields increased conformational variability. Our results suggest that imbalanced datasets and immature evaluation practices currently challenge the applicability of universal machine-learned potentials to biomolecular simulations.
Problem

Research questions and friction points this paper is trying to address.

Evaluating universal machine-learned potentials in biomolecular simulations
Assessing impact of model size and training data composition
Challenges in applying universal potentials to biomolecular systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivariant message-passing architectures for biomolecular simulations
Explicit long-range dispersion and electrostatics integration
Training on SPICE-v2 dataset with varied compositions
🔎 Similar Papers
No similar papers found.
V
Viktor Zaverkin
NEC Laboratories Europe GmbH, Kurfürsten-Anlage 36, 69115 Heidelberg, Germany
M
Matheus Ferraz
NEC OncoImmunity AS, Forskningsparken, Gaustadalléen 21, 0349 Oslo, Norway
Francesco Alesiani
Francesco Alesiani
NEC Laboratories Europe
Machine LearningInformation TheoryOptimizationControl
Mathias Niepert
Mathias Niepert
University of Stuttgart & NEC Labs Europe
Machine learning