🤖 AI Summary
In thermochemistry and materials science, experimental determination of thermodynamic properties—such as enthalpies of formation—is costly and yields fragmented, heterogeneous data. To address this bottleneck, we propose an end-to-end predictive framework integrating large language models (LLMs) with machine learning. We introduce LMExt, a novel literature-mining tool that leverages LLMs to automatically extract and harmonize thermodynamic information from unstructured, multi-source scientific literature, enabling efficient construction of a high-quality mineral thermodynamic dataset. Subsequently, we apply gradient-boosting algorithms—including CatBoost—to model structure–property relationships and predict key thermodynamic parameters with high accuracy. Experimental evaluation demonstrates substantial improvements in both data curation efficiency and prediction fidelity, reducing experimental screening costs by orders of magnitude. This work establishes a scalable, automated paradigm for thermochemical database construction and inverse materials design.
📝 Abstract
New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.