Proof-Carrying Materials: Falsifiable Safety Certificates for Machine-Learned Interatomic Potentials

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the limited reliability of machine learning interatomic potentials (MLIPs) in high-throughput materials screening, which often leads to the omission of density functional theory (DFT)-stable candidates. To this end, the authors propose a three-stage falsifiable safety certification framework that integrates adversarial sampling to identify blind spots, bootstrap confidence envelopes to refine uncertainty quantification, and formal verification in Lean 4 to certify critical predictions—thereby establishing the first provably falsifiable reliability guarantee for MLIPs. Evaluated on a benchmark of 25,000 materials, the method substantially improves recall of stable compounds, yielding 62 additional thermoelectric candidates (a 25% increase in discovery rate). The approach achieves an AUC-ROC of 0.938 for failure prediction and demonstrates cross-model transferability with an AUC near 0.70, revealing architecture-specific failure modes across different MLIPs.

Technology Category

Application Category

📝 Abstract

Machine-learned interatomic potentials (MLIPs) are deployed for high-throughput materials screening without formal reliability guarantees. We show that a single MLIP used as a stability filter misses 93% of density functional theory (DFT)-stable materials (recall 0.07) on a 25,000-material benchmark. Proof-Carrying Materials (PCM) closes this gap through three stages: adversarial falsification across compositional space, bootstrap envelope refinement with 95% confidence intervals, and Lean 4 formal certification. Auditing CHGNet, TensorNet and MACE reveals architecture-specific blind spots with near-zero pairwise error correlations (r <= 0.13; n = 5,000), confirmed by independent Quantum ESPRESSO validation (20/20 converged; median DFT/CHGNet force ratio 12x). A risk model trained on PCM-discovered features predicts failures on unseen materials (AUC-ROC = 0.938 +/- 0.004) and transfers across architectures (cross-MLIP AUC-ROC ~ 0.70; feature importance r = 0.877). In a thermoelectric screening case study, PCM-audited protocols discover 62 additional stable materials missed by single-MLIP screening - a 25% improvement in discovery yield.

Problem

Research questions and friction points this paper is trying to address.

machine-learned interatomic potentials

reliability guarantees

materials screening

stability prediction

falsifiable safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proof-Carrying Materials

machine-learned interatomic potentials

adversarial falsification