Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses key challenges in active learning for machine-learned interatomic potentials (MLIPs)—namely, the large scale of candidate pools, the need to jointly leverage energy and force supervision signals, and poor robustness under distribution shift—by proposing a linearly scalable acquisition framework. The method extends the neural tangent kernel (NTK) to force-aware settings for the first time, introducing force-NTK and joint energy–force NTK as natural similarity measures for vector field prediction. By integrating block-wise feature-space posterior variance filtering with embeddings from a pretrained MLIP, it avoids explicit computation of large kernel matrices while enhancing both efficiency and robustness. Experiments demonstrate that the approach achieves the lowest energy and force errors on OC20, matches or exceeds state-of-the-art performance on T1x, PMechDB, and RGD benchmarks with greater efficiency, and significantly outperforms ensemble-based methods under distribution shift in the candidate pool.

📝 Abstract

Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we jointly address these challenges. We first introduce a linearly scaling acquisition framework based on chunked feature-space posterior-variance shortlisting. By avoiding materialisation of the candidate and train set kernels, this approach enables screening of ~200k structures within hours and applies broadly to acquisition strategies that score candidates based on molecular similarity metrics. We then extend the Neural Tangent Kernel (NTK) to a force-aware setting via mixed parameter-coordinate derivatives, yielding a force NTK and a joint energy-force NTK that provide natural similarity metrics for vector-field prediction. We demonstrate the effectiveness of the joint energy-force NTK on the OC20 dataset, where force-aware acquisition is crucial: it achieves the lowest energy and force MAE and RMSE across all metrics and distribution splits. Across T1x, PMechDB, and RGD benchmarks, our force NTK methods remain competitive with established baselines while being significantly more efficient than committee-based approaches. Under a controlled candidate-pool shift case study on T1x, acquisition based on pretrained MLIP embeddings and NTKs remains robust, whereas committee-based methods exhibit higher variance. Overall, these results show that a single pretrained MLIP can enable scalable, force-aware, and distribution-robust active learning for foundation-model fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

active learning

machine-learning interatomic potentials

force-aware

scalability

distribution robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Force-aware NTK

Scalable active learning

Machine-learning interatomic potentials