Accelerating materials discovery using foundation model based In-context active learning

πŸ“… 2026-03-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes In-Context Active Learning (ICAL), a novel approach that leverages the pre-trained tabular foundation model TabPFN for materials discovery. Traditional active learning methods are hindered by the rigid kernel assumptions of Gaussian processes or the unreliable uncertainty estimates of random forests under small-sample conditions. ICAL circumvents these limitations by exploiting TabPFN’s Transformer architecture to perform Bayesian inference without fine-tuning, achieving well-calibrated uncertainty estimates in a single forward pass. Evaluated across ten materials datasets, ICAL outperforms conventional methods on eight, reducing the required number of experiments by 52% and 29.77% on average compared to Gaussian processes and random forests, respectively, while demonstrating significantly superior uncertainty calibration.

Technology Category

Application Category

πŸ“ Abstract
Active learning (AL) has emerged as a powerful paradigm for accelerating materials discovery by iteratively steering experiments toward the most promising candidates, reducing costly synthesis-and-characterization cycles. However, current AL relies predominantly on Gaussian Process (GP) and Random Forest (RF) surrogates with complementary limitations: GP underfits complex composition--property landscapes due to rigid kernel assumptions, while RF produces unreliable uncertainty estimates in small-data regimes, precisely where most materials datasets reside (with < 500 samples). Here we propose foudaiton model based In-Context Active Learning (ICAL), replacing conventional surrogates with TabPFN, a transformer-based foundation model pre-trained on millions of synthetic tasks to meta-learn a universal prior over tabular data. TabPFN performs principled Bayesian inference in a single forward pass without dataset-specific retraining, delivering well-calibrated predictive uncertainty where GP and RF fail most severely. Benchmarked against GP and RF across 10 materials datasets spanning copper alloy hardness and electrical conductivity, bulk metallic glass-forming ability, and crystal lattice thermal conductivity, TabPFN wins on 8 out of 10 datasets, achieving a mean saving of 52\% in extra experiments/evaluations relative to GP and 29.77% relative to RF. Cross-validation analysis confirms that TabPFN's advantage stems from superior uncertainty calibration,achieving the lowest Negative Log-Likelihood and Area Under the Sparsification Error curve among all surrogates. Our work demonstrates that a pre-trained foundation model can serve as a highly effective surrogate for accelerating active learning-based materials discovery.
Problem

Research questions and friction points this paper is trying to address.

active learning
materials discovery
surrogate models
uncertainty estimation
small-data regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
active learning
TabPFN
materials discovery
uncertainty calibration
πŸ”Ž Similar Papers
No similar papers found.