๐ค AI Summary
Catalyst performance prediction is hindered by scarcity of experimental data. Method: This paper proposes a cheminformatics-driven domain transformation method that enables efficient transfer of first-principles computational data to the experimental space. It innovatively integrates statistical ensemble theory with prior physical relationships between source (computational) and target (experimental) quantities, establishing an interpretable heterogeneous domain alignment framework. Chemical knowledge guides domain adaptation to explicitly mitigate distributional shift between computational and experimental data. Contribution/Results: The method achieves prediction accuracy comparable to models trained from scratch on over 100 experimental samplesโusing fewer than 10 experimental data points. This drastically reduces experimental trial-and-error costs and establishes a generalizable transfer learning paradigm for small-sample inverse materials design.
๐ Abstract
Simulation-to-Real (Sim2Real) transfer learning, the machine learning technique that efficiently solves a real-world task by leveraging knowledge from computational data, has received increasing attention in materials science as a promising solution to the scarcity of experimental data. We proposed an efficient transfer learning scheme from first-principles calculations to experiments based on the chemistry-informed domain transformation, that integrates the heterogeneous source and target domains by harnessing the underlying physics and chemistry. The proposed method maps the computational data from the simulation space (source domain) into the space of experimental data (target domain). During this process, these qualitatively different domains are efficiently integrated by a couple of prior knowledge of chemistry, (1) the statistical ensemble, and (2) the relationship between source and target quantities. As a proof-of-concept, we predict the catalyst activity for the reverse water-gas shift reaction by using the abundant first-principles data in addition to the experimental data. Through the demonstration, we confirmed that the transfer learning model exhibits positive transfer in accuracy and data efficiency. In particular, a significantly high accuracy was achieved despite using a few (less than ten) target data in domain transformation, whose accuracy is one order of magnitude smaller than that of a full scratch model trained with over 100 target data. This result indicates that the proposed method leverages the high prediction performance with few target data, which helps to save the number of trials in real laboratories.