Accelerating materials discovery using foundation model based In-context active learning

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work proposes In-Context Active Learning (ICAL), a novel approach that leverages the pre-trained tabular foundation model TabPFN for materials discovery. Traditional active learning methods are hindered by the rigid kernel assumptions of Gaussian processes or the unreliable uncertainty estimates of random forests under small-sample conditions. ICAL circumvents these limitations by exploiting TabPFN’s Transformer architecture to perform Bayesian inference without fine-tuning, achieving well-calibrated uncertainty estimates in a single forward pass. Evaluated across ten materials datasets, ICAL outperforms conventional methods on eight, reducing the required number of experiments by 52% and 29.77% on average compared to Gaussian processes and random forests, respectively, while demonstrating significantly superior uncertainty calibration.

Technology Category

Application Category

📝 Abstract

Active learning (AL) has emerged as a powerful paradigm for accelerating materials discovery by iteratively steering experiments toward the most promising candidates, reducing costly synthesis-and-characterization cycles. However, current AL relies predominantly on Gaussian Process (GP) and Random Forest (RF) surrogates with complementary limitations: GP underfits complex composition--property landscapes due to rigid kernel assumptions, while RF produces unreliable uncertainty estimates in small-data regimes, precisely where most materials datasets reside (with < 500 samples). Here we propose foudaiton model based In-Context Active Learning (ICAL), replacing conventional surrogates with TabPFN, a transformer-based foundation model pre-trained on millions of synthetic tasks to meta-learn a universal prior over tabular data. TabPFN performs principled Bayesian inference in a single forward pass without dataset-specific retraining, delivering well-calibrated predictive uncertainty where GP and RF fail most severely. Benchmarked against GP and RF across 10 materials datasets spanning copper alloy hardness and electrical conductivity, bulk metallic glass-forming ability, and crystal lattice thermal conductivity, TabPFN wins on 8 out of 10 datasets, achieving a mean saving of 52\% in extra experiments/evaluations relative to GP and 29.77% relative to RF. Cross-validation analysis confirms that TabPFN's advantage stems from superior uncertainty calibration,achieving the lowest Negative Log-Likelihood and Area Under the Sparsification Error curve among all surrogates. Our work demonstrates that a pre-trained foundation model can serve as a highly effective surrogate for accelerating active learning-based materials discovery.

Problem

Research questions and friction points this paper is trying to address.

active learning

materials discovery

surrogate models

uncertainty estimation

small-data regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model

active learning

TabPFN