π€ AI Summary
Current machine learning interatomic potentials (MLIPs) typically rely on computationally expensive ensemble methods for uncertainty quantification, which often exhibit weak correlation between predicted uncertainties and actual errors. This work proposes PROBE, a method that reframes reliability assessment as a selective classification problem by introducing a lightweight posterior discriminative classifier atop frozen, pre-trained atomic representations from any MLIP that exposes atomic embeddings. Without modifying the original model, PROBE enables efficient uncertainty quantification and produces chemically interpretable atomic importance maps. Experiments across multiple large-scale test sets and diverse MLIP backbones demonstrate that PROBE significantly outperforms ensemble-based disagreement methods, yielding reliability probabilities that are highly monotonically correlated with true prediction errorsβan advantage that further improves as the expressive capacity of the backbone architecture increases.
π Abstract
Prevailing machine-learned interatomic potential (MLIP) uncertainty-quantification methods rely on ensembles of independently trained backbones. These methods scale unfavorably with foundation-scale MLIPs, and their member-disagreement signals correlate weakly with per-molecule prediction error. Here we probe the frozen per-atom representations of a pretrained MLIP with a compact discriminative classifier, recasting MLIP uncertainty quantification as selective classification rather than error regression. The resulting method, PROBE (Post-hoc Reliability frOm Backbone Embeddings), produces a per-prediction reliability probability that monotonically tracks actual error without modification to the underlying model. Across large held-out evaluation sets and two structurally distinct MLIP architectures, PROBE outperforms ensemble disagreement as a binary reliability signal, which strengthens with the expressiveness of the backbone representation, implying a favorable scaling trajectory toward foundation-scale MLIPs. Multi-head self-attention additionally yields per-atom importance maps, providing chemically interpretable diagnostics at no additional computational cost. PROBE is post-hoc and architecture-agnostic, and is directly deployable on any MLIP that exposes per-atom representations.