Knowing when to trust machine-learned interatomic potentials

πŸ“… 2026-05-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

215K/year
πŸ€– AI Summary
Current machine learning interatomic potentials (MLIPs) typically rely on computationally expensive ensemble methods for uncertainty quantification, which often exhibit weak correlation between predicted uncertainties and actual errors. This work proposes PROBE, a method that reframes reliability assessment as a selective classification problem by introducing a lightweight posterior discriminative classifier atop frozen, pre-trained atomic representations from any MLIP that exposes atomic embeddings. Without modifying the original model, PROBE enables efficient uncertainty quantification and produces chemically interpretable atomic importance maps. Experiments across multiple large-scale test sets and diverse MLIP backbones demonstrate that PROBE significantly outperforms ensemble-based disagreement methods, yielding reliability probabilities that are highly monotonically correlated with true prediction errorsβ€”an advantage that further improves as the expressive capacity of the backbone architecture increases.
πŸ“ Abstract
Prevailing machine-learned interatomic potential (MLIP) uncertainty-quantification methods rely on ensembles of independently trained backbones. These methods scale unfavorably with foundation-scale MLIPs, and their member-disagreement signals correlate weakly with per-molecule prediction error. Here we probe the frozen per-atom representations of a pretrained MLIP with a compact discriminative classifier, recasting MLIP uncertainty quantification as selective classification rather than error regression. The resulting method, PROBE (Post-hoc Reliability frOm Backbone Embeddings), produces a per-prediction reliability probability that monotonically tracks actual error without modification to the underlying model. Across large held-out evaluation sets and two structurally distinct MLIP architectures, PROBE outperforms ensemble disagreement as a binary reliability signal, which strengthens with the expressiveness of the backbone representation, implying a favorable scaling trajectory toward foundation-scale MLIPs. Multi-head self-attention additionally yields per-atom importance maps, providing chemically interpretable diagnostics at no additional computational cost. PROBE is post-hoc and architecture-agnostic, and is directly deployable on any MLIP that exposes per-atom representations.
Problem

Research questions and friction points this paper is trying to address.

machine-learned interatomic potentials
uncertainty quantification
reliability assessment
foundation-scale models
prediction error
Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty quantification
machine-learned interatomic potentials
selective classification
post-hoc reliability
attention-based interpretability
πŸ”Ž Similar Papers
No similar papers found.