🤖 AI Summary
Thermoelectric materials research has long been hindered by the absence of comprehensive, reliable, and structured databases. To address this, we present the first open-source, high-quality, and scalable thermoelectric materials database, encompassing 7,123 materials with curated data on chemical composition, crystal structure, Seebeck coefficient, electrical and thermal conductivity, power factor, and ZT values. Methodologically, we introduce GPTArticleExtractor—a novel LLM-driven workflow enabling fully automated literature parsing and data curation for thermoelectrics—integrating intelligent full-text extraction from Elsevier publications and structured mapping of heterogeneous multi-source data. This approach overcomes longstanding bottlenecks in manual database construction. The database has already enabled multiple data-driven studies on thermoelectric property prediction and optimization, substantially accelerating the discovery of high-performance thermoelectric materials.
📝 Abstract
Thermoelectric materials provide a sustainable way to convert waste heat into electricity. However, data-driven discovery and optimization of these materials are challenging because of a lack of a reliable database. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.